You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Felix Chern <id...@gmail.com> on 2014/01/29 23:46:41 UTC
Capture Directory Context in Hadoop Mapper
Hi all,
I wrote a tutorial of how to receive path information in Mapper class. It's useful in our hadoop use case where we need to apply different logic on different input source directory. Enjoy!
http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
Felix
Re: Capture Directory Context in Hadoop Mapper
Posted by Felix Chern <id...@gmail.com>.
MultipleInputs is nice. Most of the time, I use it for reduce-side join.
It's great, however, you'll need to specify different Mapper class per input directory.
In our case, we try to let the Mapper itself to capture the directory information, because these directories might contain
data across months, and the the file structures may differ a bit time by time.
Finally, this is the solution I came up with, and it's fun to hack on lower level APIs. :D
Yet, thanks for suggesting!
Felix
On Jan 29, 2014, at 10:15 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> These posts are nicely written - thanks for sharing! Have you also
> taken a look at the MultipleInputs feature, which gives you a cleaner
> approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
>
> On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
>> Hi all,
>>
>> I wrote a tutorial of how to receive path information in Mapper class. It's
>> useful in our hadoop use case where we need to apply different logic on
>> different input source directory. Enjoy!
>>
>> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
>> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>>
>> Felix
>
>
>
> --
> Harsh J
Re: Capture Directory Context in Hadoop Mapper
Posted by Felix Chern <id...@gmail.com>.
MultipleInputs is nice. Most of the time, I use it for reduce-side join.
It's great, however, you'll need to specify different Mapper class per input directory.
In our case, we try to let the Mapper itself to capture the directory information, because these directories might contain
data across months, and the the file structures may differ a bit time by time.
Finally, this is the solution I came up with, and it's fun to hack on lower level APIs. :D
Yet, thanks for suggesting!
Felix
On Jan 29, 2014, at 10:15 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> These posts are nicely written - thanks for sharing! Have you also
> taken a look at the MultipleInputs feature, which gives you a cleaner
> approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
>
> On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
>> Hi all,
>>
>> I wrote a tutorial of how to receive path information in Mapper class. It's
>> useful in our hadoop use case where we need to apply different logic on
>> different input source directory. Enjoy!
>>
>> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
>> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>>
>> Felix
>
>
>
> --
> Harsh J
Re: Capture Directory Context in Hadoop Mapper
Posted by Felix Chern <id...@gmail.com>.
MultipleInputs is nice. Most of the time, I use it for reduce-side join.
It's great, however, you'll need to specify different Mapper class per input directory.
In our case, we try to let the Mapper itself to capture the directory information, because these directories might contain
data across months, and the the file structures may differ a bit time by time.
Finally, this is the solution I came up with, and it's fun to hack on lower level APIs. :D
Yet, thanks for suggesting!
Felix
On Jan 29, 2014, at 10:15 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> These posts are nicely written - thanks for sharing! Have you also
> taken a look at the MultipleInputs feature, which gives you a cleaner
> approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
>
> On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
>> Hi all,
>>
>> I wrote a tutorial of how to receive path information in Mapper class. It's
>> useful in our hadoop use case where we need to apply different logic on
>> different input source directory. Enjoy!
>>
>> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
>> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>>
>> Felix
>
>
>
> --
> Harsh J
Re: Capture Directory Context in Hadoop Mapper
Posted by Felix Chern <id...@gmail.com>.
MultipleInputs is nice. Most of the time, I use it for reduce-side join.
It's great, however, you'll need to specify different Mapper class per input directory.
In our case, we try to let the Mapper itself to capture the directory information, because these directories might contain
data across months, and the the file structures may differ a bit time by time.
Finally, this is the solution I came up with, and it's fun to hack on lower level APIs. :D
Yet, thanks for suggesting!
Felix
On Jan 29, 2014, at 10:15 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> These posts are nicely written - thanks for sharing! Have you also
> taken a look at the MultipleInputs feature, which gives you a cleaner
> approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
>
> On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
>> Hi all,
>>
>> I wrote a tutorial of how to receive path information in Mapper class. It's
>> useful in our hadoop use case where we need to apply different logic on
>> different input source directory. Enjoy!
>>
>> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
>> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>>
>> Felix
>
>
>
> --
> Harsh J
Re: Capture Directory Context in Hadoop Mapper
Posted by Harsh J <ha...@cloudera.com>.
Hi,
These posts are nicely written - thanks for sharing! Have you also
taken a look at the MultipleInputs feature, which gives you a cleaner
approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
> Hi all,
>
> I wrote a tutorial of how to receive path information in Mapper class. It's
> useful in our hadoop use case where we need to apply different logic on
> different input source directory. Enjoy!
>
> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>
> Felix
--
Harsh J
Re: Capture Directory Context in Hadoop Mapper
Posted by Harsh J <ha...@cloudera.com>.
Hi,
These posts are nicely written - thanks for sharing! Have you also
taken a look at the MultipleInputs feature, which gives you a cleaner
approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
> Hi all,
>
> I wrote a tutorial of how to receive path information in Mapper class. It's
> useful in our hadoop use case where we need to apply different logic on
> different input source directory. Enjoy!
>
> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>
> Felix
--
Harsh J
Re: Capture Directory Context in Hadoop Mapper
Posted by Harsh J <ha...@cloudera.com>.
Hi,
These posts are nicely written - thanks for sharing! Have you also
taken a look at the MultipleInputs feature, which gives you a cleaner
approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
> Hi all,
>
> I wrote a tutorial of how to receive path information in Mapper class. It's
> useful in our hadoop use case where we need to apply different logic on
> different input source directory. Enjoy!
>
> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>
> Felix
--
Harsh J
Re: Capture Directory Context in Hadoop Mapper
Posted by Harsh J <ha...@cloudera.com>.
Hi,
These posts are nicely written - thanks for sharing! Have you also
taken a look at the MultipleInputs feature, which gives you a cleaner
approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <id...@gmail.com> wrote:
> Hi all,
>
> I wrote a tutorial of how to receive path information in Mapper class. It's
> useful in our hadoop use case where we need to apply different logic on
> different input source directory. Enjoy!
>
> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>
> Felix
--
Harsh J