You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by hitarth trivedi <t....@gmail.com> on 2015/01/05 21:55:14 UTC
Write and Read file through map reduce
Hi,
I have 6 node cluster, and the scenario is as follows :-
I have one map reduce job which will write file1 in HDFS.
I have another map reduce job which will write file2 in HDFS.
In the third map reduce job I need to use file1 and file2 to do some
computation and output the value.
What is the best way to store file1 and file2 in HDFS so that they could be
used in third map reduce job.
Thanks,
Hitarth
Re: Write and Read file through map reduce
Posted by Raj K Singh <ra...@gmail.com>.
you can configure your third mapreduce job using MultipleFileInput and read
those file into you job. if the file size is small then you can consider
the DistributedCache which will give you an optimal performance if you are
joining the datasets of file1 and file2. I will also recommend you to use
some job scheduling api oozie to make sure that thrid job kicks off only
when the file1 and file2 are available on the HDFS( the same can be done by
some shell script or JobControl implementation).
::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile Tel: +91 (0)9899821370
On Tue, Jan 6, 2015 at 2:25 AM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>
Re: Write and Read file through map reduce
Posted by Shahab Yunus <sh...@gmail.com>.
Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Regards,
Shahab
On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <un...@gmail.com>
wrote:
> Hi hitarth
> ,
>
> If your file1 and file 2 is smaller you can move on with Distributed Cache.
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
> .
>
> Or you can move on with MultipleInputFormat
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
> .
>
> [1]
> http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
> [2]
> http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
>
> On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Hitarth:
>> You can also consider MultiFileInputFormat (and its concrete
>> implementations).
>>
>> Cheers
>>
>> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>>
>>> Hitarth,
>>>
>>> I don't know how much direction you are looking for with regards to the
>>> formats of the times but you can certainly read both files into the third
>>> mapreduce job using the FileInputFormat by comma-separating the paths to
>>> the files. The blocks for both files will essentially be unioned together
>>> and the mappers scheduled across your cluster.
>>>
>>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 6 node cluster, and the scenario is as follows :-
>>>>
>>>> I have one map reduce job which will write file1 in HDFS.
>>>> I have another map reduce job which will write file2 in HDFS.
>>>> In the third map reduce job I need to use file1 and file2 to do some
>>>> computation and output the value.
>>>>
>>>> What is the best way to store file1 and file2 in HDFS so that they
>>>> could be used in third map reduce job.
>>>>
>>>> Thanks,
>>>> Hitarth
>>>>
>>>
>>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
Re: Write and Read file through map reduce
Posted by Shahab Yunus <sh...@gmail.com>.
Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Regards,
Shahab
On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <un...@gmail.com>
wrote:
> Hi hitarth
> ,
>
> If your file1 and file 2 is smaller you can move on with Distributed Cache.
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
> .
>
> Or you can move on with MultipleInputFormat
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
> .
>
> [1]
> http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
> [2]
> http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
>
> On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Hitarth:
>> You can also consider MultiFileInputFormat (and its concrete
>> implementations).
>>
>> Cheers
>>
>> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>>
>>> Hitarth,
>>>
>>> I don't know how much direction you are looking for with regards to the
>>> formats of the times but you can certainly read both files into the third
>>> mapreduce job using the FileInputFormat by comma-separating the paths to
>>> the files. The blocks for both files will essentially be unioned together
>>> and the mappers scheduled across your cluster.
>>>
>>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 6 node cluster, and the scenario is as follows :-
>>>>
>>>> I have one map reduce job which will write file1 in HDFS.
>>>> I have another map reduce job which will write file2 in HDFS.
>>>> In the third map reduce job I need to use file1 and file2 to do some
>>>> computation and output the value.
>>>>
>>>> What is the best way to store file1 and file2 in HDFS so that they
>>>> could be used in third map reduce job.
>>>>
>>>> Thanks,
>>>> Hitarth
>>>>
>>>
>>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
Re: Write and Read file through map reduce
Posted by Shahab Yunus <sh...@gmail.com>.
Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Regards,
Shahab
On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <un...@gmail.com>
wrote:
> Hi hitarth
> ,
>
> If your file1 and file 2 is smaller you can move on with Distributed Cache.
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
> .
>
> Or you can move on with MultipleInputFormat
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
> .
>
> [1]
> http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
> [2]
> http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
>
> On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Hitarth:
>> You can also consider MultiFileInputFormat (and its concrete
>> implementations).
>>
>> Cheers
>>
>> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>>
>>> Hitarth,
>>>
>>> I don't know how much direction you are looking for with regards to the
>>> formats of the times but you can certainly read both files into the third
>>> mapreduce job using the FileInputFormat by comma-separating the paths to
>>> the files. The blocks for both files will essentially be unioned together
>>> and the mappers scheduled across your cluster.
>>>
>>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 6 node cluster, and the scenario is as follows :-
>>>>
>>>> I have one map reduce job which will write file1 in HDFS.
>>>> I have another map reduce job which will write file2 in HDFS.
>>>> In the third map reduce job I need to use file1 and file2 to do some
>>>> computation and output the value.
>>>>
>>>> What is the best way to store file1 and file2 in HDFS so that they
>>>> could be used in third map reduce job.
>>>>
>>>> Thanks,
>>>> Hitarth
>>>>
>>>
>>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
Re: Write and Read file through map reduce
Posted by Shahab Yunus <sh...@gmail.com>.
Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Regards,
Shahab
On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <un...@gmail.com>
wrote:
> Hi hitarth
> ,
>
> If your file1 and file 2 is smaller you can move on with Distributed Cache.
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
> .
>
> Or you can move on with MultipleInputFormat
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
> .
>
> [1]
> http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
> [2]
> http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
>
> On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Hitarth:
>> You can also consider MultiFileInputFormat (and its concrete
>> implementations).
>>
>> Cheers
>>
>> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>>
>>> Hitarth,
>>>
>>> I don't know how much direction you are looking for with regards to the
>>> formats of the times but you can certainly read both files into the third
>>> mapreduce job using the FileInputFormat by comma-separating the paths to
>>> the files. The blocks for both files will essentially be unioned together
>>> and the mappers scheduled across your cluster.
>>>
>>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 6 node cluster, and the scenario is as follows :-
>>>>
>>>> I have one map reduce job which will write file1 in HDFS.
>>>> I have another map reduce job which will write file2 in HDFS.
>>>> In the third map reduce job I need to use file1 and file2 to do some
>>>> computation and output the value.
>>>>
>>>> What is the best way to store file1 and file2 in HDFS so that they
>>>> could be used in third map reduce job.
>>>>
>>>> Thanks,
>>>> Hitarth
>>>>
>>>
>>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
Re: Write and Read file through map reduce
Posted by unmesha sreeveni <un...@gmail.com>.
Hi hitarth
,
If your file1 and file 2 is smaller you can move on with Distributed Cache.
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
.
Or you can move on with MultipleInputFormat
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
.
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
[2]
http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
> Hitarth:
> You can also consider MultiFileInputFormat (and its concrete
> implementations).
>
> Cheers
>
> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>
>> Hitarth,
>>
>> I don't know how much direction you are looking for with regards to the
>> formats of the times but you can certainly read both files into the third
>> mapreduce job using the FileInputFormat by comma-separating the paths to
>> the files. The blocks for both files will essentially be unioned together
>> and the mappers scheduled across your cluster.
>>
>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have 6 node cluster, and the scenario is as follows :-
>>>
>>> I have one map reduce job which will write file1 in HDFS.
>>> I have another map reduce job which will write file2 in HDFS.
>>> In the third map reduce job I need to use file1 and file2 to do some
>>> computation and output the value.
>>>
>>> What is the best way to store file1 and file2 in HDFS so that they could
>>> be used in third map reduce job.
>>>
>>> Thanks,
>>> Hitarth
>>>
>>
>>
>
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: Write and Read file through map reduce
Posted by unmesha sreeveni <un...@gmail.com>.
Hi hitarth
,
If your file1 and file 2 is smaller you can move on with Distributed Cache.
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
.
Or you can move on with MultipleInputFormat
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
.
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
[2]
http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
> Hitarth:
> You can also consider MultiFileInputFormat (and its concrete
> implementations).
>
> Cheers
>
> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>
>> Hitarth,
>>
>> I don't know how much direction you are looking for with regards to the
>> formats of the times but you can certainly read both files into the third
>> mapreduce job using the FileInputFormat by comma-separating the paths to
>> the files. The blocks for both files will essentially be unioned together
>> and the mappers scheduled across your cluster.
>>
>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have 6 node cluster, and the scenario is as follows :-
>>>
>>> I have one map reduce job which will write file1 in HDFS.
>>> I have another map reduce job which will write file2 in HDFS.
>>> In the third map reduce job I need to use file1 and file2 to do some
>>> computation and output the value.
>>>
>>> What is the best way to store file1 and file2 in HDFS so that they could
>>> be used in third map reduce job.
>>>
>>> Thanks,
>>> Hitarth
>>>
>>
>>
>
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: Write and Read file through map reduce
Posted by unmesha sreeveni <un...@gmail.com>.
Hi hitarth
,
If your file1 and file 2 is smaller you can move on with Distributed Cache.
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
.
Or you can move on with MultipleInputFormat
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
.
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
[2]
http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
> Hitarth:
> You can also consider MultiFileInputFormat (and its concrete
> implementations).
>
> Cheers
>
> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>
>> Hitarth,
>>
>> I don't know how much direction you are looking for with regards to the
>> formats of the times but you can certainly read both files into the third
>> mapreduce job using the FileInputFormat by comma-separating the paths to
>> the files. The blocks for both files will essentially be unioned together
>> and the mappers scheduled across your cluster.
>>
>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have 6 node cluster, and the scenario is as follows :-
>>>
>>> I have one map reduce job which will write file1 in HDFS.
>>> I have another map reduce job which will write file2 in HDFS.
>>> In the third map reduce job I need to use file1 and file2 to do some
>>> computation and output the value.
>>>
>>> What is the best way to store file1 and file2 in HDFS so that they could
>>> be used in third map reduce job.
>>>
>>> Thanks,
>>> Hitarth
>>>
>>
>>
>
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: Write and Read file through map reduce
Posted by unmesha sreeveni <un...@gmail.com>.
Hi hitarth
,
If your file1 and file 2 is smaller you can move on with Distributed Cache.
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
.
Or you can move on with MultipleInputFormat
mentioned here
<http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>
.
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
[2]
http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yu...@gmail.com> wrote:
> Hitarth:
> You can also consider MultiFileInputFormat (and its concrete
> implementations).
>
> Cheers
>
> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
>
>> Hitarth,
>>
>> I don't know how much direction you are looking for with regards to the
>> formats of the times but you can certainly read both files into the third
>> mapreduce job using the FileInputFormat by comma-separating the paths to
>> the files. The blocks for both files will essentially be unioned together
>> and the mappers scheduled across your cluster.
>>
>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have 6 node cluster, and the scenario is as follows :-
>>>
>>> I have one map reduce job which will write file1 in HDFS.
>>> I have another map reduce job which will write file2 in HDFS.
>>> In the third map reduce job I need to use file1 and file2 to do some
>>> computation and output the value.
>>>
>>> What is the best way to store file1 and file2 in HDFS so that they could
>>> be used in third map reduce job.
>>>
>>> Thanks,
>>> Hitarth
>>>
>>
>>
>
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: Write and Read file through map reduce
Posted by Ted Yu <yu...@gmail.com>.
Hitarth:
You can also consider MultiFileInputFormat (and its concrete
implementations).
Cheers
On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
> Hitarth,
>
> I don't know how much direction you are looking for with regards to the
> formats of the times but you can certainly read both files into the third
> mapreduce job using the FileInputFormat by comma-separating the paths to
> the files. The blocks for both files will essentially be unioned together
> and the mappers scheduled across your cluster.
>
> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have 6 node cluster, and the scenario is as follows :-
>>
>> I have one map reduce job which will write file1 in HDFS.
>> I have another map reduce job which will write file2 in HDFS.
>> In the third map reduce job I need to use file1 and file2 to do some
>> computation and output the value.
>>
>> What is the best way to store file1 and file2 in HDFS so that they could
>> be used in third map reduce job.
>>
>> Thanks,
>> Hitarth
>>
>
>
Re: Write and Read file through map reduce
Posted by Ted Yu <yu...@gmail.com>.
Hitarth:
You can also consider MultiFileInputFormat (and its concrete
implementations).
Cheers
On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
> Hitarth,
>
> I don't know how much direction you are looking for with regards to the
> formats of the times but you can certainly read both files into the third
> mapreduce job using the FileInputFormat by comma-separating the paths to
> the files. The blocks for both files will essentially be unioned together
> and the mappers scheduled across your cluster.
>
> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have 6 node cluster, and the scenario is as follows :-
>>
>> I have one map reduce job which will write file1 in HDFS.
>> I have another map reduce job which will write file2 in HDFS.
>> In the third map reduce job I need to use file1 and file2 to do some
>> computation and output the value.
>>
>> What is the best way to store file1 and file2 in HDFS so that they could
>> be used in third map reduce job.
>>
>> Thanks,
>> Hitarth
>>
>
>
Re: Write and Read file through map reduce
Posted by Ted Yu <yu...@gmail.com>.
Hitarth:
You can also consider MultiFileInputFormat (and its concrete
implementations).
Cheers
On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
> Hitarth,
>
> I don't know how much direction you are looking for with regards to the
> formats of the times but you can certainly read both files into the third
> mapreduce job using the FileInputFormat by comma-separating the paths to
> the files. The blocks for both files will essentially be unioned together
> and the mappers scheduled across your cluster.
>
> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have 6 node cluster, and the scenario is as follows :-
>>
>> I have one map reduce job which will write file1 in HDFS.
>> I have another map reduce job which will write file2 in HDFS.
>> In the third map reduce job I need to use file1 and file2 to do some
>> computation and output the value.
>>
>> What is the best way to store file1 and file2 in HDFS so that they could
>> be used in third map reduce job.
>>
>> Thanks,
>> Hitarth
>>
>
>
Re: Write and Read file through map reduce
Posted by Ted Yu <yu...@gmail.com>.
Hitarth:
You can also consider MultiFileInputFormat (and its concrete
implementations).
Cheers
On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cj...@gmail.com> wrote:
> Hitarth,
>
> I don't know how much direction you are looking for with regards to the
> formats of the times but you can certainly read both files into the third
> mapreduce job using the FileInputFormat by comma-separating the paths to
> the files. The blocks for both files will essentially be unioned together
> and the mappers scheduled across your cluster.
>
> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have 6 node cluster, and the scenario is as follows :-
>>
>> I have one map reduce job which will write file1 in HDFS.
>> I have another map reduce job which will write file2 in HDFS.
>> In the third map reduce job I need to use file1 and file2 to do some
>> computation and output the value.
>>
>> What is the best way to store file1 and file2 in HDFS so that they could
>> be used in third map reduce job.
>>
>> Thanks,
>> Hitarth
>>
>
>
Re: Write and Read file through map reduce
Posted by Corey Nolet <cj...@gmail.com>.
Hitarth,
I don't know how much direction you are looking for with regards to the
formats of the times but you can certainly read both files into the third
mapreduce job using the FileInputFormat by comma-separating the paths to
the files. The blocks for both files will essentially be unioned together
and the mappers scheduled across your cluster.
On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>
Re: Write and Read file through map reduce
Posted by Corey Nolet <cj...@gmail.com>.
Hitarth,
I don't know how much direction you are looking for with regards to the
formats of the times but you can certainly read both files into the third
mapreduce job using the FileInputFormat by comma-separating the paths to
the files. The blocks for both files will essentially be unioned together
and the mappers scheduled across your cluster.
On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>
Re: Write and Read file through map reduce
Posted by Raj K Singh <ra...@gmail.com>.
you can configure your third mapreduce job using MultipleFileInput and read
those file into you job. if the file size is small then you can consider
the DistributedCache which will give you an optimal performance if you are
joining the datasets of file1 and file2. I will also recommend you to use
some job scheduling api oozie to make sure that thrid job kicks off only
when the file1 and file2 are available on the HDFS( the same can be done by
some shell script or JobControl implementation).
::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile Tel: +91 (0)9899821370
On Tue, Jan 6, 2015 at 2:25 AM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>
Re: Write and Read file through map reduce
Posted by Raj K Singh <ra...@gmail.com>.
you can configure your third mapreduce job using MultipleFileInput and read
those file into you job. if the file size is small then you can consider
the DistributedCache which will give you an optimal performance if you are
joining the datasets of file1 and file2. I will also recommend you to use
some job scheduling api oozie to make sure that thrid job kicks off only
when the file1 and file2 are available on the HDFS( the same can be done by
some shell script or JobControl implementation).
::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile Tel: +91 (0)9899821370
On Tue, Jan 6, 2015 at 2:25 AM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>
Re: Write and Read file through map reduce
Posted by Corey Nolet <cj...@gmail.com>.
Hitarth,
I don't know how much direction you are looking for with regards to the
formats of the times but you can certainly read both files into the third
mapreduce job using the FileInputFormat by comma-separating the paths to
the files. The blocks for both files will essentially be unioned together
and the mappers scheduled across your cluster.
On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>
Re: Write and Read file through map reduce
Posted by Raj K Singh <ra...@gmail.com>.
you can configure your third mapreduce job using MultipleFileInput and read
those file into you job. if the file size is small then you can consider
the DistributedCache which will give you an optimal performance if you are
joining the datasets of file1 and file2. I will also recommend you to use
some job scheduling api oozie to make sure that thrid job kicks off only
when the file1 and file2 are available on the HDFS( the same can be done by
some shell script or JobControl implementation).
::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile Tel: +91 (0)9899821370
On Tue, Jan 6, 2015 at 2:25 AM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>
Re: Write and Read file through map reduce
Posted by Corey Nolet <cj...@gmail.com>.
Hitarth,
I don't know how much direction you are looking for with regards to the
formats of the times but you can certainly read both files into the third
mapreduce job using the FileInputFormat by comma-separating the paths to
the files. The blocks for both files will essentially be unioned together
and the mappers scheduled across your cluster.
On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t....@gmail.com> wrote:
> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>