You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "@dataElGrande" <ma...@gmail.com> on 2012/05/15 20:00:48 UTC

Re: hadoop File loading

You should check out Pentaho's howto's dealing with Hadoop and MapReducer.
Hope this helps! http://wiki.pentaho.com/display/BAD/How+To%27s



hari708 wrote:
> 
> Hi,
> I have a big file consisting of XML data.the XML is not represented as a
> single line in the file. if we stream this file using ./hadoop dfs -put
> command to a hadoop directory .How the distribution happens.?
> Basically in My mapreduce program i am expecting a complete XML as my
> input.i have a CustomReader(for XML) in my mapreduce job configuration.My
> main confusion is if namenode distribute data to DataNodes ,there is a
> chance that a part of xml can go to one data node and other half can go in
> another datanode.If that is the case will my custom XMLReader in the
> mapreduce be able to combine it(as mapreduce reads data locally only).
> Please help me on this?
> 

-- 
View this message in context: http://old.nabble.com/hadoop-File-loading-tp32871902p33849683.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: hadoop File loading

Posted by samir das mohapatra <sa...@gmail.com>.
HI,

Your requirment is that your M/R will use  full xml file while operating.
(If it is write then please one of the approach bellow)
So you can put this xml file in  DistrubutedChache which will shared
accross the M/R . So that your will get whole xml instead of chunk of data.

Thanks
   Samir

On Tue, May 15, 2012 at 11:30 PM, @dataElGrande <ma...@gmail.com>wrote:

>
> You should check out Pentaho's howto's dealing with Hadoop and MapReducer.
> Hope this helps! http://wiki.pentaho.com/display/BAD/How+To%27s
>
>
>
> hari708 wrote:
> >
> > Hi,
> > I have a big file consisting of XML data.the XML is not represented as a
> > single line in the file. if we stream this file using ./hadoop dfs -put
> > command to a hadoop directory .How the distribution happens.?
> > Basically in My mapreduce program i am expecting a complete XML as my
> > input.i have a CustomReader(for XML) in my mapreduce job configuration.My
> > main confusion is if namenode distribute data to DataNodes ,there is a
> > chance that a part of xml can go to one data node and other half can go
> in
> > another datanode.If that is the case will my custom XMLReader in the
> > mapreduce be able to combine it(as mapreduce reads data locally only).
> > Please help me on this?
> >
>
> --
> View this message in context:
> http://old.nabble.com/hadoop-File-loading-tp32871902p33849683.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>