You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by hari708 <ha...@gmail.com> on 2011/11/22 02:20:38 UTC

hadoop File loading

Hi,
I have a big file consisting of XML data.the XML is not represented as a
single line in the file. if we stream this file using ./hadoop dfs -put
command to a hadoop directory .How the distribution happens.?
Basically in My mapreduce program i am expecting a complete XML as my
input.i have a CustomReader(for XML) in my mapreduce job configuration.My
main confusion is if namenode distribute data to DataNodes ,there is a
chance that a part of xml can go to one data node and other half can go in
another datanode.If that is the case will my custom XMLReader in the
mapreduce be able to combine it(as mapreduce reads data locally only).
Please help me on this?
-- 
View this message in context: http://old.nabble.com/hadoop-File-loading-tp32871902p32871902.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: hadoop File loading

Posted by Jeff Zhang <zj...@gmail.com>.

It will work as long as you consider the xml tag boundary in your
RecordReader.

On Tue, Nov 22, 2011 at 9:20 AM, hari708 <ha...@gmail.com> wrote:

>
> Hi,
> I have a big file consisting of XML data.the XML is not represented as a
> single line in the file. if we stream this file using ./hadoop dfs -put
> command to a hadoop directory .How the distribution happens.?
> Basically in My mapreduce program i am expecting a complete XML as my
> input.i have a CustomReader(for XML) in my mapreduce job configuration.My
> main confusion is if namenode distribute data to DataNodes ,there is a
> chance that a part of xml can go to one data node and other half can go in
> another datanode.If that is the case will my custom XMLReader in the
> mapreduce be able to combine it(as mapreduce reads data locally only).
> Please help me on this?
> --
> View this message in context:
> http://old.nabble.com/hadoop-File-loading-tp32871902p32871902.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
Best Regards

Jeff Zhang

Re: hadoop File loading

Posted by samir das mohapatra <sa...@gmail.com>.

HI,

Your requirment is that your M/R will use  full xml file while operating.
(If it is write then please one of the approach bellow)
So you can put this xml file in  DistrubutedChache which will shared
accross the M/R . So that your will get whole xml instead of chunk of data.

Thanks
   Samir

On Tue, May 15, 2012 at 11:30 PM, @dataElGrande <ma...@gmail.com>wrote:

>
> You should check out Pentaho's howto's dealing with Hadoop and MapReducer.
> Hope this helps! http://wiki.pentaho.com/display/BAD/How+To%27s
>
>
>
> hari708 wrote:
> >
> > Hi,
> > I have a big file consisting of XML data.the XML is not represented as a
> > single line in the file. if we stream this file using ./hadoop dfs -put
> > command to a hadoop directory .How the distribution happens.?
> > Basically in My mapreduce program i am expecting a complete XML as my
> > input.i have a CustomReader(for XML) in my mapreduce job configuration.My
> > main confusion is if namenode distribute data to DataNodes ,there is a
> > chance that a part of xml can go to one data node and other half can go
> in
> > another datanode.If that is the case will my custom XMLReader in the
> > mapreduce be able to combine it(as mapreduce reads data locally only).
> > Please help me on this?
> >
>
> --
> View this message in context:
> http://old.nabble.com/hadoop-File-loading-tp32871902p33849683.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: hadoop File loading

Posted by "@dataElGrande" <ma...@gmail.com>.

You should check out Pentaho's howto's dealing with Hadoop and MapReducer.
Hope this helps! http://wiki.pentaho.com/display/BAD/How+To%27s



hari708 wrote:
> 
> Hi,
> I have a big file consisting of XML data.the XML is not represented as a
> single line in the file. if we stream this file using ./hadoop dfs -put
> command to a hadoop directory .How the distribution happens.?
> Basically in My mapreduce program i am expecting a complete XML as my
> input.i have a CustomReader(for XML) in my mapreduce job configuration.My
> main confusion is if namenode distribute data to DataNodes ,there is a
> chance that a part of xml can go to one data node and other half can go in
> another datanode.If that is the case will my custom XMLReader in the
> mapreduce be able to combine it(as mapreduce reads data locally only).
> Please help me on this?
> 

-- 
View this message in context: http://old.nabble.com/hadoop-File-loading-tp32871902p33849683.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.