You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Arko Provo Mukherjee <ar...@gmail.com> on 2015/02/27 00:47:54 UTC

Changing the InputFormat

Hello,

I am trying to write a Hadoop program that handles JSON and hence wrote a
CustomInputFormat to handle the data. The Custom format extends the
RecordReader and then overrides the nextKeyValue() method.

However, this doesn't solve the problem when one JSON object is split
across two InputSplit. I was wondering if there is a way to change how to
Input file is broken in to InputSplits so that I can control it and not let
the JSON break between the splits.

Any help will be much appreciated!

Many thanks in advance!
Warm regards
Arko