You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Periya.Data" <pe...@gmail.com> on 2011/09/15 23:19:42 UTC

example of splitting a binary file

Hi all,
    Is there a nice example that shows how to split a large binary file into
splits? If there is one, please let me know. It will be a great place to for
me to start.

More ideally, I want to create a custom InputFormat from
sequenceFileAsBinaryInputFormat and a custom record-reader that can properly
read well-defined records (with known offsets) in my binary input file.

But, for now, to begin, I want to learn the basics => read a binary file,
break it into splits of known size and play with a record-reader and get
some output. I do not want to do any map-reduce yet on them. Once I know how
to do those, I can gradually build on it.

Please let me know if there are any links to such examples.

Thanks,
PD.