You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Alejandro Montenegro <aa...@gmail.com> on 2010/08/26 18:43:29 UTC
processing xml with headers
Hi,
I have some XML files with a structure like this:
<document>
<header>some text</header>
<record>record 1</record>
<record>record 2</record>
....
<record>record N</record>
<document>
Where the info in the header is necessary for processing the records. By
using Mahou's XmlInputFormat I'm able to rescue every <record> but not the
info in the header, an option is not to split the file and process it as a
whole in the mapper, but sometimes the files are over 200MB and I belive it
would not be very efficient.
So if any has some suggestion about how to process this kind of file, I
would appreciate it!
thanks
--
Alejandro Montenegro del Pino.
ViƱa del Mar - Chile
phone: (+56) 9-68358690