You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Kylie McCormick <ky...@gmail.com> on 2008/07/12 01:35:31 UTC

FileInput / RecordReader question

Hello Again:
I'm currently working with the code for inputs (and inputsplit) with Hadoop.
There is some helpful information on the Map-Reduce tutorial, but I'm having
some issues with the coding-end of it.

I would like to have a file that lists each of the end points I want to
contact, with the following information also listed: URL, client class, and
name. Right now, I see I need to use a RecordReader, since logical splitting
of the file could cause larger entries to be cut in half or shorter entries
to be bunched together. As of right now, the StreamXMLRecordReader is the
closest variation to want I want to use.

(StreamXMLRecordReader information @
http://hadoop.apache.org/core/docs/r0.17.0/api/org/apache/hadoop/streaming/StreamXmlRecordReader.html
)

However, I'm not certain it will provide the functionality that I need. I
would need to extract the three strings to generate the appropriate value.
Is there another tutorial on Input/InputSplit for Hadoop? I am attempting to
code my own RecordReader, and I'm uncertain if that would be necessary...
and, if it is, specifics of the code.

Thanks,
Kylie