You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Manoj Babu <ma...@gmail.com> on 2012/07/13 05:29:51 UTC
suggest Best way to upload xml files to HDFS
Hi,
I need to upload large xml files files daily. Right now am having a small
program to read all the files from local folder and writing it to HDFS as a
single file. Is this a right way?
If there any best practices or optimized way to achieve this Kindly let me
know.
Thanks in advance!
Cheers!
Manoj.
Re: suggest Best way to upload xml files to HDFS
Posted by Manoj Babu <ma...@gmail.com>.
Hi,
Could you kindly provide the pros and cons of Multifile, combilefile,
sequencefile input format?
Thanks in Advance.
Cheers!
Manoj.
On Fri, Jul 13, 2012 at 10:15 AM, Bejoy KS <be...@gmail.com> wrote:
> **
> Hi Manoj
>
> If you are looking at a scheduler and a work flow manager to carry out
> this task you can have a look at oozie.
>
> If your xml files are smaller(smaller than hdfs block size) then
> definitely it is a better practice to combine them to form larger files.
> Combining into Sequence Files should be good.
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Manoj Babu <ma...@gmail.com>
> *Date: *Fri, 13 Jul 2012 08:59:51 +0530
> *To: *<ma...@hadoop.apache.org>
> *ReplyTo: * mapreduce-user@hadoop.apache.org
> *Subject: *suggest Best way to upload xml files to HDFS
>
> Hi,
>
> I need to upload large xml files files daily. Right now am having a small
> program to read all the files from local folder and writing it to HDFS as a
> single file. Is this a right way?
> If there any best practices or optimized way to achieve this Kindly let me
> know.
>
> Thanks in advance!
>
> Cheers!
> Manoj.
>
>
Re: suggest Best way to upload xml files to HDFS
Posted by Bejoy KS <be...@gmail.com>.
Hi Manoj
If you are looking at a scheduler and a work flow manager to carry out this task you can have a look at oozie.
If your xml files are smaller(smaller than hdfs block size) then definitely it is a better practice to combine them to form larger files. Combining into Sequence Files should be good.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message-----
From: Manoj Babu <ma...@gmail.com>
Date: Fri, 13 Jul 2012 08:59:51
To: <ma...@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: suggest Best way to upload xml files to HDFS
Hi,
I need to upload large xml files files daily. Right now am having a small
program to read all the files from local folder and writing it to HDFS as a
single file. Is this a right way?
If there any best practices or optimized way to achieve this Kindly let me
know.
Thanks in advance!
Cheers!
Manoj.
Re: suggest Best way to upload xml files to HDFS
Posted by Harsh J <ha...@cloudera.com>.
If you're looking at automated file/record/event collection, take a
look at Apache Flume: http://incubator.apache.org/flume/. It does well
for distributed collections as well and is very configurable.
Otherwise, write a scheduled script to do the uploads every X period
(your choice). Consider using
https://github.com/edwardcapriolo/filecrush or similar tools too, if
your files are much small and getting in the way of MR processing.
On Fri, Jul 13, 2012 at 8:59 AM, Manoj Babu <ma...@gmail.com> wrote:
> Hi,
>
> I need to upload large xml files files daily. Right now am having a small
> program to read all the files from local folder and writing it to HDFS as a
> single file. Is this a right way?
> If there any best practices or optimized way to achieve this Kindly let me
> know.
>
> Thanks in advance!
>
> Cheers!
> Manoj.
>
--
Harsh J