You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Manoj Babu <ma...@gmail.com> on 2012/07/13 05:29:51 UTC

suggest Best way to upload xml files to HDFS

Hi,

I need to upload large xml files files daily. Right now am having a small
program to read all the files from local folder and writing it to HDFS as a
single file. Is this a right way?
If there any best practices or optimized way to achieve this Kindly let me
know.

Thanks in advance!

Cheers!
Manoj.

Re: suggest Best way to upload xml files to HDFS

Posted by Manoj Babu <ma...@gmail.com>.
Hi,

Could you kindly provide the pros and cons of Multifile, combilefile,
sequencefile input format?

Thanks in Advance.

Cheers!
Manoj.



On Fri, Jul 13, 2012 at 10:15 AM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi Manoj
>
> If you are looking at a scheduler and a work flow manager to carry out
> this task you can have a look at oozie.
>
> If your xml files are smaller(smaller than hdfs block size) then
> definitely it is a better practice to combine them to form larger files.
> Combining into Sequence Files should be good.
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Manoj Babu <ma...@gmail.com>
> *Date: *Fri, 13 Jul 2012 08:59:51 +0530
> *To: *<ma...@hadoop.apache.org>
> *ReplyTo: * mapreduce-user@hadoop.apache.org
> *Subject: *suggest Best way to upload xml files to HDFS
>
> Hi,
>
> I need to upload large xml files files daily. Right now am having a small
> program to read all the files from local folder and writing it to HDFS as a
> single file. Is this a right way?
> If there any best practices or optimized way to achieve this Kindly let me
> know.
>
> Thanks in advance!
>
> Cheers!
> Manoj.
>
>

Re: suggest Best way to upload xml files to HDFS

Posted by Bejoy KS <be...@gmail.com>.
Hi Manoj

If you are looking at a scheduler and a work flow manager to carry out this task you can have a look at oozie.

If your xml files are smaller(smaller than hdfs block size) then definitely it is a better practice to combine them to form larger files. Combining into Sequence Files should be good.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Manoj Babu <ma...@gmail.com>
Date: Fri, 13 Jul 2012 08:59:51 
To: <ma...@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: suggest Best way to upload xml files to HDFS

Hi,

I need to upload large xml files files daily. Right now am having a small
program to read all the files from local folder and writing it to HDFS as a
single file. Is this a right way?
If there any best practices or optimized way to achieve this Kindly let me
know.

Thanks in advance!

Cheers!
Manoj.


Re: suggest Best way to upload xml files to HDFS

Posted by Harsh J <ha...@cloudera.com>.
If you're looking at automated file/record/event collection, take a
look at Apache Flume: http://incubator.apache.org/flume/. It does well
for distributed collections as well and is very configurable.

Otherwise, write a scheduled script to do the uploads every X period
(your choice). Consider using
https://github.com/edwardcapriolo/filecrush or similar tools too, if
your files are much small and getting in the way of MR processing.

On Fri, Jul 13, 2012 at 8:59 AM, Manoj Babu <ma...@gmail.com> wrote:
> Hi,
>
> I need to upload large xml files files daily. Right now am having a small
> program to read all the files from local folder and writing it to HDFS as a
> single file. Is this a right way?
> If there any best practices or optimized way to achieve this Kindly let me
> know.
>
> Thanks in advance!
>
> Cheers!
> Manoj.
>



-- 
Harsh J