You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by visioner sadak <vi...@gmail.com> on 2011/10/04 20:53:57 UTC

Hadoop file uploads

Hello guys,

            I would like to know how to do file uploads in HDFS using
java,is it to be done using map reduce what if i have a large number of
small files should i use sequence file along with map reduce???,It will be
great if you can provide some sort of information...

Re: Hadoop file uploads

Posted by visioner sadak <vi...@gmail.com>.

Hey brock do you have a proper code its like giving a lot of errors!!!!!!

On Thu, Oct 13, 2011 at 4:29 PM, Brock Noland <br...@cloudera.com> wrote:

> Hi,
>
> The code is very similar, just create a SequenceFile reader.
>
> Brock
>
> On Thu, Oct 13, 2011 at 4:53 AM, visioner sadak <vi...@gmail.com>wrote:
>
>> Hello Brock,
>>
>>                   Thanks a lot for your help man,should i run this code
>> after doing the small file uploads i mean i have a java api which does the
>> small file uploads and reads as well,how will be i able to read the files as
>> well
>>
>>
>>
>> On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <br...@cloudera.com> wrote:
>>
>>> Hi,
>>>
>>> This:  http://pastebin.com/YFzAh0Nj
>>>
>>> will convert a directory of small files to a sequence file. The key is
>>> the filename, the value the file itself. This works if each individual file
>>> is small enough to fit in memory. If you have some files which are larger
>>> and those files can be split up, they can be split over multiple key value
>>> pairs.
>>>
>>> Brock
>>>
>>> On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak <
>>> visioner.sadak@gmail.com> wrote:
>>>
>>>> Hello guys,
>>>>
>>>>             Thanks a lot again for your previous guidance guys,i tried
>>>> out java api to do file uploads its wrking fine,now i need to modify the
>>>> code using sequence files so that i can handle large number of small files
>>>> in hadoop.. for that i encountered 2 links
>>>>
>>>> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
>>>> sequence)
>>>> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)
>>>>
>>>> could you pls tell me which approach is better to follow or should i
>>>> follow HAR(hadoop archive) approach,i came to know that in sequence file we
>>>> can combine smaller files in to one big one but dunt know how to split and
>>>> retrieve the small files again while reading files,,, as well..
>>>> Thanks and Gratitude
>>>>   On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak <
>>>> visioner.sadak@gmail.com> wrote:
>>>>
>>>>> Thanks a lot wellington and bejoy for your inputs will try out this api
>>>>> and sequence file....
>>>>>
>>>>>
>>>>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
>>>>> wellington.chevreuil@gmail.com> wrote:
>>>>>
>>>>>> Yes, Sadak,
>>>>>>
>>>>>> Within this API, you'll copy your files into Hadoop HDFS as you do
>>>>>> when writing to an OutputStream. It will be replicated in your
>>>>>> cluster's HDFS then.
>>>>>>
>>>>>> Cheers.
>>>>>>
>>>>>> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>>>>>  > Hey thanks wellington just a thought will my data be replicated as
>>>>>> well coz
>>>>>> > i thought tht mapper does the job of breaking data in to pieces and
>>>>>> > distribution and reducer will do the joining and combining while
>>>>>> fetching
>>>>>> > data back thts why was confused to use a MR..can i use this API for
>>>>>> > uploading a large number of small files as well thru my application
>>>>>> or
>>>>>> > should i use sequence file class for that...because i saw the small
>>>>>> file
>>>>>> > problem in hadoop as well as mentioned in below link
>>>>>> >
>>>>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>> >
>>>>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>>>>>> > <we...@gmail.com> wrote:
>>>>>> >>
>>>>>> >> Hey Sadak,
>>>>>> >>
>>>>>> >> you don't need to write a MR job for that. You can make your java
>>>>>> >> program use Hadoop Java API for that. You would need to use
>>>>>> FileSystem
>>>>>> >>
>>>>>> >> (
>>>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>>>>>> )
>>>>>> >> and Path
>>>>>> >> (
>>>>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>>>>>> )
>>>>>> >> classes for that.
>>>>>> >>
>>>>>> >> Cheers,
>>>>>> >> Wellington.
>>>>>> >>
>>>>>> >> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>>>>> >> > Hello guys,
>>>>>> >> >
>>>>>> >> >             I would like to know how to do file uploads in HDFS
>>>>>> using
>>>>>> >> > java,is it to be done using map reduce what if i have a large
>>>>>> number of
>>>>>> >> > small files should i use sequence file along with map
>>>>>> reduce???,It will
>>>>>> >> > be
>>>>>> >> > great if you can provide some sort of information...
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop file uploads

Posted by Brock Noland <br...@cloudera.com>.

Hi,

The code is very similar, just create a SequenceFile reader.

Brock

On Thu, Oct 13, 2011 at 4:53 AM, visioner sadak <vi...@gmail.com>wrote:

> Hello Brock,
>
>                   Thanks a lot for your help man,should i run this code
> after doing the small file uploads i mean i have a java api which does the
> small file uploads and reads as well,how will be i able to read the files as
> well
>
>
>
> On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <br...@cloudera.com> wrote:
>
>> Hi,
>>
>> This:  http://pastebin.com/YFzAh0Nj
>>
>> will convert a directory of small files to a sequence file. The key is the
>> filename, the value the file itself. This works if each individual file is
>> small enough to fit in memory. If you have some files which are larger and
>> those files can be split up, they can be split over multiple key value
>> pairs.
>>
>> Brock
>>
>> On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak <visioner.sadak@gmail.com
>> > wrote:
>>
>>> Hello guys,
>>>
>>>             Thanks a lot again for your previous guidance guys,i tried
>>> out java api to do file uploads its wrking fine,now i need to modify the
>>> code using sequence files so that i can handle large number of small files
>>> in hadoop.. for that i encountered 2 links
>>>
>>> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
>>> sequence)
>>> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)
>>>
>>> could you pls tell me which approach is better to follow or should i
>>> follow HAR(hadoop archive) approach,i came to know that in sequence file we
>>> can combine smaller files in to one big one but dunt know how to split and
>>> retrieve the small files again while reading files,,, as well..
>>>  Thanks and Gratitude
>>> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak <visioner.sadak@gmail.com
>>> > wrote:
>>>
>>>> Thanks a lot wellington and bejoy for your inputs will try out this api
>>>> and sequence file....
>>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
>>>> wellington.chevreuil@gmail.com> wrote:
>>>>
>>>>> Yes, Sadak,
>>>>>
>>>>> Within this API, you'll copy your files into Hadoop HDFS as you do
>>>>> when writing to an OutputStream. It will be replicated in your
>>>>> cluster's HDFS then.
>>>>>
>>>>> Cheers.
>>>>>
>>>>> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>>>>  > Hey thanks wellington just a thought will my data be replicated as
>>>>> well coz
>>>>> > i thought tht mapper does the job of breaking data in to pieces and
>>>>> > distribution and reducer will do the joining and combining while
>>>>> fetching
>>>>> > data back thts why was confused to use a MR..can i use this API for
>>>>> > uploading a large number of small files as well thru my application
>>>>> or
>>>>> > should i use sequence file class for that...because i saw the small
>>>>> file
>>>>> > problem in hadoop as well as mentioned in below link
>>>>> >
>>>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>> >
>>>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>>>>> > <we...@gmail.com> wrote:
>>>>> >>
>>>>> >> Hey Sadak,
>>>>> >>
>>>>> >> you don't need to write a MR job for that. You can make your java
>>>>> >> program use Hadoop Java API for that. You would need to use
>>>>> FileSystem
>>>>> >>
>>>>> >> (
>>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>>>>> )
>>>>> >> and Path
>>>>> >> (
>>>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>>>>> )
>>>>> >> classes for that.
>>>>> >>
>>>>> >> Cheers,
>>>>> >> Wellington.
>>>>> >>
>>>>> >> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>>>> >> > Hello guys,
>>>>> >> >
>>>>> >> >             I would like to know how to do file uploads in HDFS
>>>>> using
>>>>> >> > java,is it to be done using map reduce what if i have a large
>>>>> number of
>>>>> >> > small files should i use sequence file along with map reduce???,It
>>>>> will
>>>>> >> > be
>>>>> >> > great if you can provide some sort of information...
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Hadoop file uploads

Posted by visioner sadak <vi...@gmail.com>.

Hello Brock,

                  Thanks a lot for your help man,should i run this code
after doing the small file uploads i mean i have a java api which does the
small file uploads and reads as well,how will be i able to read the files as
well


On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <br...@cloudera.com> wrote:

> Hi,
>
> This:  http://pastebin.com/YFzAh0Nj
>
> will convert a directory of small files to a sequence file. The key is the
> filename, the value the file itself. This works if each individual file is
> small enough to fit in memory. If you have some files which are larger and
> those files can be split up, they can be split over multiple key value
> pairs.
>
> Brock
>
> On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak <vi...@gmail.com>wrote:
>
>> Hello guys,
>>
>>             Thanks a lot again for your previous guidance guys,i tried out
>> java api to do file uploads its wrking fine,now i need to modify the code
>> using sequence files so that i can handle large number of small files in
>> hadoop.. for that i encountered 2 links
>>
>> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
>> sequence)
>> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)
>>
>> could you pls tell me which approach is better to follow or should i
>> follow HAR(hadoop archive) approach,i came to know that in sequence file we
>> can combine smaller files in to one big one but dunt know how to split and
>> retrieve the small files again while reading files,,, as well..
>>  Thanks and Gratitude
>> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak <vi...@gmail.com>wrote:
>>
>>> Thanks a lot wellington and bejoy for your inputs will try out this api
>>> and sequence file....
>>>
>>>
>>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
>>> wellington.chevreuil@gmail.com> wrote:
>>>
>>>> Yes, Sadak,
>>>>
>>>> Within this API, you'll copy your files into Hadoop HDFS as you do
>>>> when writing to an OutputStream. It will be replicated in your
>>>> cluster's HDFS then.
>>>>
>>>> Cheers.
>>>>
>>>> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>>>  > Hey thanks wellington just a thought will my data be replicated as
>>>> well coz
>>>> > i thought tht mapper does the job of breaking data in to pieces and
>>>> > distribution and reducer will do the joining and combining while
>>>> fetching
>>>> > data back thts why was confused to use a MR..can i use this API for
>>>> > uploading a large number of small files as well thru my application or
>>>> > should i use sequence file class for that...because i saw the small
>>>> file
>>>> > problem in hadoop as well as mentioned in below link
>>>> >
>>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>> >
>>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>>>> > <we...@gmail.com> wrote:
>>>> >>
>>>> >> Hey Sadak,
>>>> >>
>>>> >> you don't need to write a MR job for that. You can make your java
>>>> >> program use Hadoop Java API for that. You would need to use
>>>> FileSystem
>>>> >>
>>>> >> (
>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>>>> )
>>>> >> and Path
>>>> >> (
>>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>>>> )
>>>> >> classes for that.
>>>> >>
>>>> >> Cheers,
>>>> >> Wellington.
>>>> >>
>>>> >> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>>> >> > Hello guys,
>>>> >> >
>>>> >> >             I would like to know how to do file uploads in HDFS
>>>> using
>>>> >> > java,is it to be done using map reduce what if i have a large
>>>> number of
>>>> >> > small files should i use sequence file along with map reduce???,It
>>>> will
>>>> >> > be
>>>> >> > great if you can provide some sort of information...
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Hadoop file uploads

Posted by Brock Noland <br...@cloudera.com>.

Hi,

This:  http://pastebin.com/YFzAh0Nj

will convert a directory of small files to a sequence file. The key is the
filename, the value the file itself. This works if each individual file is
small enough to fit in memory. If you have some files which are larger and
those files can be split up, they can be split over multiple key value
pairs.

Brock

On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak <vi...@gmail.com>wrote:

> Hello guys,
>
>             Thanks a lot again for your previous guidance guys,i tried out
> java api to do file uploads its wrking fine,now i need to modify the code
> using sequence files so that i can handle large number of small files in
> hadoop.. for that i encountered 2 links
>
> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
> sequence)
> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)
>
> could you pls tell me which approach is better to follow or should i follow
> HAR(hadoop archive) approach,i came to know that in sequence file we can
> combine smaller files in to one big one but dunt know how to split and
> retrieve the small files again while reading files,,, as well..
>  Thanks and Gratitude
> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak <vi...@gmail.com>wrote:
>
>> Thanks a lot wellington and bejoy for your inputs will try out this api
>> and sequence file....
>>
>>
>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
>> wellington.chevreuil@gmail.com> wrote:
>>
>>> Yes, Sadak,
>>>
>>> Within this API, you'll copy your files into Hadoop HDFS as you do
>>> when writing to an OutputStream. It will be replicated in your
>>> cluster's HDFS then.
>>>
>>> Cheers.
>>>
>>> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>>  > Hey thanks wellington just a thought will my data be replicated as
>>> well coz
>>> > i thought tht mapper does the job of breaking data in to pieces and
>>> > distribution and reducer will do the joining and combining while
>>> fetching
>>> > data back thts why was confused to use a MR..can i use this API for
>>> > uploading a large number of small files as well thru my application or
>>> > should i use sequence file class for that...because i saw the small
>>> file
>>> > problem in hadoop as well as mentioned in below link
>>> >
>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>> >
>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>>> > <we...@gmail.com> wrote:
>>> >>
>>> >> Hey Sadak,
>>> >>
>>> >> you don't need to write a MR job for that. You can make your java
>>> >> program use Hadoop Java API for that. You would need to use FileSystem
>>> >>
>>> >> (
>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>>> )
>>> >> and Path
>>> >> (
>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>>> )
>>> >> classes for that.
>>> >>
>>> >> Cheers,
>>> >> Wellington.
>>> >>
>>> >> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>> >> > Hello guys,
>>> >> >
>>> >> >             I would like to know how to do file uploads in HDFS
>>> using
>>> >> > java,is it to be done using map reduce what if i have a large number
>>> of
>>> >> > small files should i use sequence file along with map reduce???,It
>>> will
>>> >> > be
>>> >> > great if you can provide some sort of information...
>>> >
>>> >
>>>
>>
>>
>

Re: Hadoop file uploads

Posted by visioner sadak <vi...@gmail.com>.

Hello guys,

            Thanks a lot again for your previous guidance guys,i tried out
java api to do file uploads its wrking fine,now i need to modify the code
using sequence files so that i can handle large number of small files in
hadoop.. for that i encountered 2 links

1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
sequence)
2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)

could you pls tell me which approach is better to follow or should i follow
HAR(hadoop archive) approach,i came to know that in sequence file we can
combine smaller files in to one big one but dunt know how to split and
retrieve the small files again while reading files,,, as well..
Thanks and Gratitude
On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak <vi...@gmail.com>wrote:

> Thanks a lot wellington and bejoy for your inputs will try out this api and
> sequence file....
>
>
> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
> wellington.chevreuil@gmail.com> wrote:
>
>> Yes, Sadak,
>>
>> Within this API, you'll copy your files into Hadoop HDFS as you do
>> when writing to an OutputStream. It will be replicated in your
>> cluster's HDFS then.
>>
>> Cheers.
>>
>> 2011/10/4 visioner sadak <vi...@gmail.com>:
>>  > Hey thanks wellington just a thought will my data be replicated as
>> well coz
>> > i thought tht mapper does the job of breaking data in to pieces and
>> > distribution and reducer will do the joining and combining while
>> fetching
>> > data back thts why was confused to use a MR..can i use this API for
>> > uploading a large number of small files as well thru my application or
>> > should i use sequence file class for that...because i saw the small file
>> > problem in hadoop as well as mentioned in below link
>> >
>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>> >
>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>> > <we...@gmail.com> wrote:
>> >>
>> >> Hey Sadak,
>> >>
>> >> you don't need to write a MR job for that. You can make your java
>> >> program use Hadoop Java API for that. You would need to use FileSystem
>> >>
>> >> (
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>> )
>> >> and Path
>> >> (
>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>> )
>> >> classes for that.
>> >>
>> >> Cheers,
>> >> Wellington.
>> >>
>> >> 2011/10/4 visioner sadak <vi...@gmail.com>:
>> >> > Hello guys,
>> >> >
>> >> >             I would like to know how to do file uploads in HDFS using
>> >> > java,is it to be done using map reduce what if i have a large number
>> of
>> >> > small files should i use sequence file along with map reduce???,It
>> will
>> >> > be
>> >> > great if you can provide some sort of information...
>> >
>> >
>>
>
>

Re: Hadoop file uploads

Posted by visioner sadak <vi...@gmail.com>.

Thanks a lot wellington and bejoy for your inputs will try out this api and
sequence file....

On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
wellington.chevreuil@gmail.com> wrote:

> Yes, Sadak,
>
> Within this API, you'll copy your files into Hadoop HDFS as you do
> when writing to an OutputStream. It will be replicated in your
> cluster's HDFS then.
>
> Cheers.
>
> 2011/10/4 visioner sadak <vi...@gmail.com>:
>  > Hey thanks wellington just a thought will my data be replicated as well
> coz
> > i thought tht mapper does the job of breaking data in to pieces and
> > distribution and reducer will do the joining and combining while fetching
> > data back thts why was confused to use a MR..can i use this API for
> > uploading a large number of small files as well thru my application or
> > should i use sequence file class for that...because i saw the small file
> > problem in hadoop as well as mentioned in below link
> >
> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
> >
> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
> > <we...@gmail.com> wrote:
> >>
> >> Hey Sadak,
> >>
> >> you don't need to write a MR job for that. You can make your java
> >> program use Hadoop Java API for that. You would need to use FileSystem
> >>
> >> (
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
> )
> >> and Path
> >> (
> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
> )
> >> classes for that.
> >>
> >> Cheers,
> >> Wellington.
> >>
> >> 2011/10/4 visioner sadak <vi...@gmail.com>:
> >> > Hello guys,
> >> >
> >> >             I would like to know how to do file uploads in HDFS using
> >> > java,is it to be done using map reduce what if i have a large number
> of
> >> > small files should i use sequence file along with map reduce???,It
> will
> >> > be
> >> > great if you can provide some sort of information...
> >
> >
>

Re: Hadoop file uploads

Posted by Wellington Chevreuil <we...@gmail.com>.

Yes, Sadak,

Within this API, you'll copy your files into Hadoop HDFS as you do
when writing to an OutputStream. It will be replicated in your
cluster's HDFS then.

Cheers.

2011/10/4 visioner sadak <vi...@gmail.com>:
> Hey thanks wellington just a thought will my data be replicated as well coz
> i thought tht mapper does the job of breaking data in to pieces and
> distribution and reducer will do the joining and combining while fetching
> data back thts why was confused to use a MR..can i use this API for
> uploading a large number of small files as well thru my application or
> should i use sequence file class for that...because i saw the small file
> problem in hadoop as well as mentioned in below link
>
> http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>
> On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
> <we...@gmail.com> wrote:
>>
>> Hey Sadak,
>>
>> you don't need to write a MR job for that. You can make your java
>> program use Hadoop Java API for that. You would need to use FileSystem
>>
>> (http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html)
>> and Path
>> (http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html)
>> classes for that.
>>
>> Cheers,
>> Wellington.
>>
>> 2011/10/4 visioner sadak <vi...@gmail.com>:
>> > Hello guys,
>> >
>> >             I would like to know how to do file uploads in HDFS using
>> > java,is it to be done using map reduce what if i have a large number of
>> > small files should i use sequence file along with map reduce???,It will
>> > be
>> > great if you can provide some sort of information...
>
>

Re: Hadoop file uploads

Posted by visioner sadak <vi...@gmail.com>.

Hey thanks wellington just a thought will my data be replicated as well coz
i thought tht mapper does the job of breaking data in to pieces and
distribution and reducer will do the joining and combining while fetching
data back thts why was confused to use a MR..can i use this API for
uploading a large number of small files as well thru my application or
should i use sequence file class for that...because i saw the small file
problem in hadoop as well as mentioned in below link

http://www.cloudera.com/blog/2009/02/the-small-files-problem/

On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil <
wellington.chevreuil@gmail.com> wrote:

> Hey Sadak,
>
> you don't need to write a MR job for that. You can make your java
> program use Hadoop Java API for that. You would need to use FileSystem
> (
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
> )
> and Path (
> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
> )
> classes for that.
>
> Cheers,
> Wellington.
>
> 2011/10/4 visioner sadak <vi...@gmail.com>:
>  > Hello guys,
> >
> >             I would like to know how to do file uploads in HDFS using
> > java,is it to be done using map reduce what if i have a large number of
> > small files should i use sequence file along with map reduce???,It will
> be
> > great if you can provide some sort of information...
>

Re: Hadoop file uploads

Posted by Wellington Chevreuil <we...@gmail.com>.

Hey Sadak,

you don't need to write a MR job for that. You can make your java
program use Hadoop Java API for that. You would need to use FileSystem
(http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html)
and Path (http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html)
classes for that.

Cheers,
Wellington.

2011/10/4 visioner sadak <vi...@gmail.com>:
> Hello guys,
>
>             I would like to know how to do file uploads in HDFS using
> java,is it to be done using map reduce what if i have a large number of
> small files should i use sequence file along with map reduce???,It will be
> great if you can provide some sort of information...

Re: Hadoop file uploads

Posted by Bejoy KS <be...@gmail.com>.

Yes Sadak. The API would do the splitting for you, no need of MR for that.
It'd be better keeping the file sizes atleast same as an hdfs block size.
Sequence file is definitely a good choice. If you are looking out for a
process and then archival of input then look into HAR (hadoop archives as
well).

Thanks and Regards
Bejoy.K.S

On Wed, Oct 5, 2011 at 1:10 AM, Bejoy KS <be...@gmail.com> wrote:

> Hi Sadak
>           You really don't need to fire a map reduce job to copy files from
> a local file system to hdfs. You can do it in two easy ways
>
> *Using linux CLI* - if you are going in with a shell script. The  most
> convenient option and handy.
> hadoop fs -copyFromLocal <file/dir in lfs> <destination dir  in hdfs>
>
> *Using JAVA API*
> //load the hadoop configuration
> Configuration hadoopConf=new Configuration();
> //get the default associated file system
> FileSystem fileSystem=FileSystem.get(hadoopConf) ;
> //copy from lfs to hdfs
> fileSystem.copyFromLocalFile(new Path("source file/dir in lfs"), new
> Path("Destn dir in hdfs"));
>
> Please read the API before implementation. There are variants of the method
> copyFromLocalFile as well lot of other methods which you'd find useful if
> you are choosing java API path.
>
>  You can do the reversal operation as
> hadoop fs -copyToLocal
>
> fileSystem.copyToLocalFile(new Path("dir/file dir in hdfs"), new
> Path("Destn dir in lfs"));
>
> Hope it helps and gives you a kick start into hadoop.
>
> Thanks and Regards
> Bejoy.K.S
>
>
>
> On Wed, Oct 5, 2011 at 12:23 AM, visioner sadak <vi...@gmail.com>wrote:
>
>> Hello guys,
>>
>>             I would like to know how to do file uploads in HDFS using
>> java,is it to be done using map reduce what if i have a large number of
>> small files should i use sequence file along with map reduce???,It will be
>> great if you can provide some sort of information...
>
>
>

Re: Hadoop file uploads

Posted by Bejoy KS <be...@gmail.com>.

Hi Sadak
          You really don't need to fire a map reduce job to copy files from
a local file system to hdfs. You can do it in two easy ways

*Using linux CLI* - if you are going in with a shell script. The  most
convenient option and handy.
hadoop fs -copyFromLocal <file/dir in lfs> <destination dir  in hdfs>

*Using JAVA API*
//load the hadoop configuration
Configuration hadoopConf=new Configuration();
//get the default associated file system
FileSystem fileSystem=FileSystem.get(hadoopConf) ;
//copy from lfs to hdfs
fileSystem.copyFromLocalFile(new Path("source file/dir in lfs"), new
Path("Destn dir in hdfs"));

Please read the API before implementation. There are variants of the method
copyFromLocalFile as well lot of other methods which you'd find useful if
you are choosing java API path.

 You can do the reversal operation as
hadoop fs -copyToLocal

fileSystem.copyToLocalFile(new Path("dir/file dir in hdfs"), new Path("Destn
dir in lfs"));

Hope it helps and gives you a kick start into hadoop.

Thanks and Regards
Bejoy.K.S

On Wed, Oct 5, 2011 at 12:23 AM, visioner sadak <vi...@gmail.com>wrote:

> Hello guys,
>
>             I would like to know how to do file uploads in HDFS using
> java,is it to be done using map reduce what if i have a large number of
> small files should i use sequence file along with map reduce???,It will be
> great if you can provide some sort of information...