You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Prasan Ary <vo...@yahoo.com> on 2008/03/04 00:30:24 UTC

map/reduce function on xml string

Hi All,
  I am writing a java implementation for my map/reduce function on hadoop.
  Input to this is a xml file, and the map function has to process a well formed xml records. So far I have been unable to split the xml file at xml record boundary to feed into my map function.
  Can anybody point me to resources where forcing file split at desired boundary is explained ?
   
  thx,
  Pra.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

Re: map/reduce function on xml string

Posted by Colin Evans <co...@metaweb.com>.
Here's the code.  If folks are interested, I can submit it as a patch as 
well.



Prasan Ary wrote:
> Colin,
>   Is it possible that you share some of the code with us?
>    
>   thx,
>   Prasan
>
> Colin Evans <co...@metaweb.com> wrote:
>   We ended up subclassing TextInputFormat and adding a custom RecordReader 
> that starts and ends record reads on tags. The
> StreamXmlRecordReader class is a good reference for this.
>
>
>
> Prasan Ary wrote:
>   
>> Hi All,
>> I am writing a java implementation for my map/reduce function on hadoop.
>> Input to this is a xml file, and the map function has to process a well formed xml records. So far I have been unable to split the xml file at xml record boundary to feed into my map function.
>> Can anybody point me to resources where forcing file split at desired boundary is explained ?
>>
>> thx,
>> Pra.
>>
>>
>> ---------------------------------
>> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
>>
>>     
>
>
>
>        
> ---------------------------------
> Looking for last minute shopping deals?  Find them fast with Yahoo! Search.
>   


Re: map/reduce function on xml string

Posted by Prasan Ary <vo...@yahoo.com>.
Colin,
  Is it possible that you share some of the code with us?
   
  thx,
  Prasan

Colin Evans <co...@metaweb.com> wrote:
  We ended up subclassing TextInputFormat and adding a custom RecordReader 
that starts and ends record reads on tags. The
StreamXmlRecordReader class is a good reference for this.



Prasan Ary wrote:
> Hi All,
> I am writing a java implementation for my map/reduce function on hadoop.
> Input to this is a xml file, and the map function has to process a well formed xml records. So far I have been unable to split the xml file at xml record boundary to feed into my map function.
> Can anybody point me to resources where forcing file split at desired boundary is explained ?
> 
> thx,
> Pra.
>
> 
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
> 



       
---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Re: map/reduce function on xml string

Posted by Colin Evans <co...@metaweb.com>.
We ended up subclassing TextInputFormat and adding a custom RecordReader 
that starts and ends record reads on tags.  The
StreamXmlRecordReader class is a good reference for this.



Prasan Ary wrote:
> Hi All,
>   I am writing a java implementation for my map/reduce function on hadoop.
>   Input to this is a xml file, and the map function has to process a well formed xml records. So far I have been unable to split the xml file at xml record boundary to feed into my map function.
>   Can anybody point me to resources where forcing file split at desired boundary is explained ?
>    
>   thx,
>   Pra.
>
>        
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
>   


RE: map/reduce function on xml string

Posted by Joydeep Sen Sarma <js...@facebook.com>.
There's a StreamXmlRecordReader class in contrib/streaming that looks
like it will chunk up an xml file based on xml tags. I haven't used it
myself ..

-----Original Message-----
From: Prasan Ary [mailto:voicesnthedark@yahoo.com] 
Sent: Monday, March 03, 2008 3:30 PM
To: core-user@hadoop.apache.org
Subject: map/reduce function on xml string

Hi All,
  I am writing a java implementation for my map/reduce function on
hadoop.
  Input to this is a xml file, and the map function has to process a
well formed xml records. So far I have been unable to split the xml file
at xml record boundary to feed into my map function.
  Can anybody point me to resources where forcing file split at desired
boundary is explained ?
   
  thx,
  Pra.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try
it now.