You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Vikas Jadhav <vi...@gmail.com> on 2013/03/02 15:02:14 UTC

mapper combiner and partitioner for particular dataset

Hello

1)  I have multiple types of datasets as input to my hadoop job

i want write my own inputformat (Exa. MyTableInputformat)
 and how to specify mapper partitioner combiner per dataset manner
 I know MultiFileInputFormat class but if i want to asscoite combiner and
partitioner class
it wont help. it only sets mapper class for per dataset manner.

2)  Also i am looking MapTask.java file from source code

just want to know where does mapper partitioner and combiner classes are
set for particular filesplit
while executing job

Thank You

-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

got it
Thanx Mahesh.

On Tue, Mar 5, 2013 at 1:35 PM, Mahesh Balija <ba...@gmail.com>wrote:

> What Harsh means by that is, you should create a custom partitioner which
> should take care of partitioning the records based on the input record data
> (Key, Value). i.e., if you have multiple inputs from multiple mappers each
> might generate a key, value pair you should have something specific in your
> key/value which can be useful to figure out, that which dataset it is
> coming from (if your value is a Text, then value dataset1+value,
> dataset2+value etc). Using this info in your partitioner you can either
> write mulitple Partitioner implementations or simply one partitioner
> handling all different cases.
>
> Harsh, please correct me if I am wrong.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>> Thank You for reply
>>
>> Can u please elaborate because i am not getting wht does following means
>> in programming enviornment
>>
>>
>> you will need a custom written "high level" partitioner and combiner that
>> can create multiple instances of sub-partitioners/combiners and use the
>> most likely one based on their input's characteristics (such as instance
>> type, some tag, config., etc.).
>>
>>
>>
>> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The MultipleInputs class only supports mapper configuration per dataset.
>>> It does not let you specify a partitioner and combiner as well. You will
>>> need a custom written "high level" partitioner and combiner that can create
>>> multiple instances of sub-partitioners/combiners and use the most likely
>>> one based on their input's characteristics (such as instance type, some
>>> tag, config., etc.).
>>>
>>>
>>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello
>>>>
>>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>>
>>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>>   and how to specify mapper partitioner combiner per dataset manner
>>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>>> and partitioner class
>>>> it wont help. it only sets mapper class for per dataset manner.
>>>>
>>>> 2)  Also i am looking MapTask.java file from source code
>>>>
>>>> just want to know where does mapper partitioner and combiner classes
>>>> are set for particular filesplit
>>>> while executing job
>>>>
>>>> Thank You
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>>  Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>>
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>> Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

got it
Thanx Mahesh.

On Tue, Mar 5, 2013 at 1:35 PM, Mahesh Balija <ba...@gmail.com>wrote:

> What Harsh means by that is, you should create a custom partitioner which
> should take care of partitioning the records based on the input record data
> (Key, Value). i.e., if you have multiple inputs from multiple mappers each
> might generate a key, value pair you should have something specific in your
> key/value which can be useful to figure out, that which dataset it is
> coming from (if your value is a Text, then value dataset1+value,
> dataset2+value etc). Using this info in your partitioner you can either
> write mulitple Partitioner implementations or simply one partitioner
> handling all different cases.
>
> Harsh, please correct me if I am wrong.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>> Thank You for reply
>>
>> Can u please elaborate because i am not getting wht does following means
>> in programming enviornment
>>
>>
>> you will need a custom written "high level" partitioner and combiner that
>> can create multiple instances of sub-partitioners/combiners and use the
>> most likely one based on their input's characteristics (such as instance
>> type, some tag, config., etc.).
>>
>>
>>
>> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The MultipleInputs class only supports mapper configuration per dataset.
>>> It does not let you specify a partitioner and combiner as well. You will
>>> need a custom written "high level" partitioner and combiner that can create
>>> multiple instances of sub-partitioners/combiners and use the most likely
>>> one based on their input's characteristics (such as instance type, some
>>> tag, config., etc.).
>>>
>>>
>>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello
>>>>
>>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>>
>>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>>   and how to specify mapper partitioner combiner per dataset manner
>>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>>> and partitioner class
>>>> it wont help. it only sets mapper class for per dataset manner.
>>>>
>>>> 2)  Also i am looking MapTask.java file from source code
>>>>
>>>> just want to know where does mapper partitioner and combiner classes
>>>> are set for particular filesplit
>>>> while executing job
>>>>
>>>> Thank You
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>>  Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>>
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>> Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

got it
Thanx Mahesh.

On Tue, Mar 5, 2013 at 1:35 PM, Mahesh Balija <ba...@gmail.com>wrote:

> What Harsh means by that is, you should create a custom partitioner which
> should take care of partitioning the records based on the input record data
> (Key, Value). i.e., if you have multiple inputs from multiple mappers each
> might generate a key, value pair you should have something specific in your
> key/value which can be useful to figure out, that which dataset it is
> coming from (if your value is a Text, then value dataset1+value,
> dataset2+value etc). Using this info in your partitioner you can either
> write mulitple Partitioner implementations or simply one partitioner
> handling all different cases.
>
> Harsh, please correct me if I am wrong.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>> Thank You for reply
>>
>> Can u please elaborate because i am not getting wht does following means
>> in programming enviornment
>>
>>
>> you will need a custom written "high level" partitioner and combiner that
>> can create multiple instances of sub-partitioners/combiners and use the
>> most likely one based on their input's characteristics (such as instance
>> type, some tag, config., etc.).
>>
>>
>>
>> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The MultipleInputs class only supports mapper configuration per dataset.
>>> It does not let you specify a partitioner and combiner as well. You will
>>> need a custom written "high level" partitioner and combiner that can create
>>> multiple instances of sub-partitioners/combiners and use the most likely
>>> one based on their input's characteristics (such as instance type, some
>>> tag, config., etc.).
>>>
>>>
>>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello
>>>>
>>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>>
>>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>>   and how to specify mapper partitioner combiner per dataset manner
>>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>>> and partitioner class
>>>> it wont help. it only sets mapper class for per dataset manner.
>>>>
>>>> 2)  Also i am looking MapTask.java file from source code
>>>>
>>>> just want to know where does mapper partitioner and combiner classes
>>>> are set for particular filesplit
>>>> while executing job
>>>>
>>>> Thank You
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>>  Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>>
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>> Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

got it
Thanx Mahesh.

On Tue, Mar 5, 2013 at 1:35 PM, Mahesh Balija <ba...@gmail.com>wrote:

> What Harsh means by that is, you should create a custom partitioner which
> should take care of partitioning the records based on the input record data
> (Key, Value). i.e., if you have multiple inputs from multiple mappers each
> might generate a key, value pair you should have something specific in your
> key/value which can be useful to figure out, that which dataset it is
> coming from (if your value is a Text, then value dataset1+value,
> dataset2+value etc). Using this info in your partitioner you can either
> write mulitple Partitioner implementations or simply one partitioner
> handling all different cases.
>
> Harsh, please correct me if I am wrong.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>> Thank You for reply
>>
>> Can u please elaborate because i am not getting wht does following means
>> in programming enviornment
>>
>>
>> you will need a custom written "high level" partitioner and combiner that
>> can create multiple instances of sub-partitioners/combiners and use the
>> most likely one based on their input's characteristics (such as instance
>> type, some tag, config., etc.).
>>
>>
>>
>> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The MultipleInputs class only supports mapper configuration per dataset.
>>> It does not let you specify a partitioner and combiner as well. You will
>>> need a custom written "high level" partitioner and combiner that can create
>>> multiple instances of sub-partitioners/combiners and use the most likely
>>> one based on their input's characteristics (such as instance type, some
>>> tag, config., etc.).
>>>
>>>
>>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello
>>>>
>>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>>
>>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>>   and how to specify mapper partitioner combiner per dataset manner
>>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>>> and partitioner class
>>>> it wont help. it only sets mapper class for per dataset manner.
>>>>
>>>> 2)  Also i am looking MapTask.java file from source code
>>>>
>>>> just want to know where does mapper partitioner and combiner classes
>>>> are set for particular filesplit
>>>> while executing job
>>>>
>>>> Thank You
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>>  Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>>
>>>>
>>>> --
>>>> *
>>>> *
>>>> *
>>>>
>>>> Thanx and Regards*
>>>> * Vikas Jadhav*
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Mahesh Balija <ba...@gmail.com>.

What Harsh means by that is, you should create a custom partitioner which
should take care of partitioning the records based on the input record data
(Key, Value). i.e., if you have multiple inputs from multiple mappers each
might generate a key, value pair you should have something specific in your
key/value which can be useful to figure out, that which dataset it is
coming from (if your value is a Text, then value dataset1+value,
dataset2+value etc). Using this info in your partitioner you can either
write mulitple Partitioner implementations or simply one partitioner
handling all different cases.

Harsh, please correct me if I am wrong.

Best,
Mahesh Balija,
Calsoft Labs.

On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:

> Thank You for reply
>
> Can u please elaborate because i am not getting wht does following means
> in programming enviornment
>
>
> you will need a custom written "high level" partitioner and combiner that
> can create multiple instances of sub-partitioners/combiners and use the
> most likely one based on their input's characteristics (such as instance
> type, some tag, config., etc.).
>
>
>
> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The MultipleInputs class only supports mapper configuration per dataset.
>> It does not let you specify a partitioner and combiner as well. You will
>> need a custom written "high level" partitioner and combiner that can create
>> multiple instances of sub-partitioners/combiners and use the most likely
>> one based on their input's characteristics (such as instance type, some
>> tag, config., etc.).
>>
>>
>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>
>>>
>>>
>>>
>>>
>>> Hello
>>>
>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>
>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>   and how to specify mapper partitioner combiner per dataset manner
>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>> and partitioner class
>>> it wont help. it only sets mapper class for per dataset manner.
>>>
>>> 2)  Also i am looking MapTask.java file from source code
>>>
>>> just want to know where does mapper partitioner and combiner classes are
>>> set for particular filesplit
>>> while executing job
>>>
>>> Thank You
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>>  Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

Re: mapper combiner and partitioner for particular dataset

Posted by Mahesh Balija <ba...@gmail.com>.

What Harsh means by that is, you should create a custom partitioner which
should take care of partitioning the records based on the input record data
(Key, Value). i.e., if you have multiple inputs from multiple mappers each
might generate a key, value pair you should have something specific in your
key/value which can be useful to figure out, that which dataset it is
coming from (if your value is a Text, then value dataset1+value,
dataset2+value etc). Using this info in your partitioner you can either
write mulitple Partitioner implementations or simply one partitioner
handling all different cases.

Harsh, please correct me if I am wrong.

Best,
Mahesh Balija,
Calsoft Labs.

On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:

> Thank You for reply
>
> Can u please elaborate because i am not getting wht does following means
> in programming enviornment
>
>
> you will need a custom written "high level" partitioner and combiner that
> can create multiple instances of sub-partitioners/combiners and use the
> most likely one based on their input's characteristics (such as instance
> type, some tag, config., etc.).
>
>
>
> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The MultipleInputs class only supports mapper configuration per dataset.
>> It does not let you specify a partitioner and combiner as well. You will
>> need a custom written "high level" partitioner and combiner that can create
>> multiple instances of sub-partitioners/combiners and use the most likely
>> one based on their input's characteristics (such as instance type, some
>> tag, config., etc.).
>>
>>
>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>
>>>
>>>
>>>
>>>
>>> Hello
>>>
>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>
>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>   and how to specify mapper partitioner combiner per dataset manner
>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>> and partitioner class
>>> it wont help. it only sets mapper class for per dataset manner.
>>>
>>> 2)  Also i am looking MapTask.java file from source code
>>>
>>> just want to know where does mapper partitioner and combiner classes are
>>> set for particular filesplit
>>> while executing job
>>>
>>> Thank You
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>>  Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

Re: mapper combiner and partitioner for particular dataset

Posted by Mahesh Balija <ba...@gmail.com>.

What Harsh means by that is, you should create a custom partitioner which
should take care of partitioning the records based on the input record data
(Key, Value). i.e., if you have multiple inputs from multiple mappers each
might generate a key, value pair you should have something specific in your
key/value which can be useful to figure out, that which dataset it is
coming from (if your value is a Text, then value dataset1+value,
dataset2+value etc). Using this info in your partitioner you can either
write mulitple Partitioner implementations or simply one partitioner
handling all different cases.

Harsh, please correct me if I am wrong.

Best,
Mahesh Balija,
Calsoft Labs.

On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:

> Thank You for reply
>
> Can u please elaborate because i am not getting wht does following means
> in programming enviornment
>
>
> you will need a custom written "high level" partitioner and combiner that
> can create multiple instances of sub-partitioners/combiners and use the
> most likely one based on their input's characteristics (such as instance
> type, some tag, config., etc.).
>
>
>
> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The MultipleInputs class only supports mapper configuration per dataset.
>> It does not let you specify a partitioner and combiner as well. You will
>> need a custom written "high level" partitioner and combiner that can create
>> multiple instances of sub-partitioners/combiners and use the most likely
>> one based on their input's characteristics (such as instance type, some
>> tag, config., etc.).
>>
>>
>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>
>>>
>>>
>>>
>>>
>>> Hello
>>>
>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>
>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>   and how to specify mapper partitioner combiner per dataset manner
>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>> and partitioner class
>>> it wont help. it only sets mapper class for per dataset manner.
>>>
>>> 2)  Also i am looking MapTask.java file from source code
>>>
>>> just want to know where does mapper partitioner and combiner classes are
>>> set for particular filesplit
>>> while executing job
>>>
>>> Thank You
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>>  Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

Re: mapper combiner and partitioner for particular dataset

Posted by Mahesh Balija <ba...@gmail.com>.

What Harsh means by that is, you should create a custom partitioner which
should take care of partitioning the records based on the input record data
(Key, Value). i.e., if you have multiple inputs from multiple mappers each
might generate a key, value pair you should have something specific in your
key/value which can be useful to figure out, that which dataset it is
coming from (if your value is a Text, then value dataset1+value,
dataset2+value etc). Using this info in your partitioner you can either
write mulitple Partitioner implementations or simply one partitioner
handling all different cases.

Harsh, please correct me if I am wrong.

Best,
Mahesh Balija,
Calsoft Labs.

On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <vi...@gmail.com>wrote:

> Thank You for reply
>
> Can u please elaborate because i am not getting wht does following means
> in programming enviornment
>
>
> you will need a custom written "high level" partitioner and combiner that
> can create multiple instances of sub-partitioners/combiners and use the
> most likely one based on their input's characteristics (such as instance
> type, some tag, config., etc.).
>
>
>
> On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The MultipleInputs class only supports mapper configuration per dataset.
>> It does not let you specify a partitioner and combiner as well. You will
>> need a custom written "high level" partitioner and combiner that can create
>> multiple instances of sub-partitioners/combiners and use the most likely
>> one based on their input's characteristics (such as instance type, some
>> tag, config., etc.).
>>
>>
>> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>>
>>>
>>>
>>>
>>>
>>> Hello
>>>
>>> 1)  I have multiple types of datasets as input to my hadoop job
>>>
>>> i want write my own inputformat (Exa. MyTableInputformat)
>>>   and how to specify mapper partitioner combiner per dataset manner
>>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>>> and partitioner class
>>> it wont help. it only sets mapper class for per dataset manner.
>>>
>>> 2)  Also i am looking MapTask.java file from source code
>>>
>>> just want to know where does mapper partitioner and combiner classes are
>>> set for particular filesplit
>>> while executing job
>>>
>>> Thank You
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>>  Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Thank You for reply

Can u please elaborate because i am not getting wht does following means in
programming enviornment

you will need a custom written "high level" partitioner and combiner that
can create multiple instances of sub-partitioners/combiners and use the
most likely one based on their input's characteristics (such as instance
type, some tag, config., etc.).



On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:

> The MultipleInputs class only supports mapper configuration per dataset.
> It does not let you specify a partitioner and combiner as well. You will
> need a custom written "high level" partitioner and combiner that can create
> multiple instances of sub-partitioners/combiners and use the most likely
> one based on their input's characteristics (such as instance type, some
> tag, config., etc.).
>
>
> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>>
>>
>>
>>
>> Hello
>>
>> 1)  I have multiple types of datasets as input to my hadoop job
>>
>> i want write my own inputformat (Exa. MyTableInputformat)
>>   and how to specify mapper partitioner combiner per dataset manner
>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>> and partitioner class
>> it wont help. it only sets mapper class for per dataset manner.
>>
>> 2)  Also i am looking MapTask.java file from source code
>>
>> just want to know where does mapper partitioner and combiner classes are
>> set for particular filesplit
>> while executing job
>>
>> Thank You
>>
>> --
>> *
>> *
>> *
>>
>>  Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>
>
> --
> Harsh J
>



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Thank You for reply

Can u please elaborate because i am not getting wht does following means in
programming enviornment

you will need a custom written "high level" partitioner and combiner that
can create multiple instances of sub-partitioners/combiners and use the
most likely one based on their input's characteristics (such as instance
type, some tag, config., etc.).



On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:

> The MultipleInputs class only supports mapper configuration per dataset.
> It does not let you specify a partitioner and combiner as well. You will
> need a custom written "high level" partitioner and combiner that can create
> multiple instances of sub-partitioners/combiners and use the most likely
> one based on their input's characteristics (such as instance type, some
> tag, config., etc.).
>
>
> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>>
>>
>>
>>
>> Hello
>>
>> 1)  I have multiple types of datasets as input to my hadoop job
>>
>> i want write my own inputformat (Exa. MyTableInputformat)
>>   and how to specify mapper partitioner combiner per dataset manner
>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>> and partitioner class
>> it wont help. it only sets mapper class for per dataset manner.
>>
>> 2)  Also i am looking MapTask.java file from source code
>>
>> just want to know where does mapper partitioner and combiner classes are
>> set for particular filesplit
>> while executing job
>>
>> Thank You
>>
>> --
>> *
>> *
>> *
>>
>>  Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>
>
> --
> Harsh J
>



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Thank You for reply

Can u please elaborate because i am not getting wht does following means in
programming enviornment

you will need a custom written "high level" partitioner and combiner that
can create multiple instances of sub-partitioners/combiners and use the
most likely one based on their input's characteristics (such as instance
type, some tag, config., etc.).



On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:

> The MultipleInputs class only supports mapper configuration per dataset.
> It does not let you specify a partitioner and combiner as well. You will
> need a custom written "high level" partitioner and combiner that can create
> multiple instances of sub-partitioners/combiners and use the most likely
> one based on their input's characteristics (such as instance type, some
> tag, config., etc.).
>
>
> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>>
>>
>>
>>
>> Hello
>>
>> 1)  I have multiple types of datasets as input to my hadoop job
>>
>> i want write my own inputformat (Exa. MyTableInputformat)
>>   and how to specify mapper partitioner combiner per dataset manner
>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>> and partitioner class
>> it wont help. it only sets mapper class for per dataset manner.
>>
>> 2)  Also i am looking MapTask.java file from source code
>>
>> just want to know where does mapper partitioner and combiner classes are
>> set for particular filesplit
>> while executing job
>>
>> Thank You
>>
>> --
>> *
>> *
>> *
>>
>>  Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>
>
> --
> Harsh J
>



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Thank You for reply

Can u please elaborate because i am not getting wht does following means in
programming enviornment

you will need a custom written "high level" partitioner and combiner that
can create multiple instances of sub-partitioners/combiners and use the
most likely one based on their input's characteristics (such as instance
type, some tag, config., etc.).



On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <ha...@cloudera.com> wrote:

> The MultipleInputs class only supports mapper configuration per dataset.
> It does not let you specify a partitioner and combiner as well. You will
> need a custom written "high level" partitioner and combiner that can create
> multiple instances of sub-partitioners/combiners and use the most likely
> one based on their input's characteristics (such as instance type, some
> tag, config., etc.).
>
>
> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:
>
>>
>>
>>
>>
>> Hello
>>
>> 1)  I have multiple types of datasets as input to my hadoop job
>>
>> i want write my own inputformat (Exa. MyTableInputformat)
>>   and how to specify mapper partitioner combiner per dataset manner
>>  I know MultiFileInputFormat class but if i want to asscoite combiner
>> and partitioner class
>> it wont help. it only sets mapper class for per dataset manner.
>>
>> 2)  Also i am looking MapTask.java file from source code
>>
>> just want to know where does mapper partitioner and combiner classes are
>> set for particular filesplit
>> while executing job
>>
>> Thank You
>>
>> --
>> *
>> *
>> *
>>
>>  Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>
>
>
> --
> Harsh J
>



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: mapper combiner and partitioner for particular dataset

Posted by Harsh J <ha...@cloudera.com>.

The MultipleInputs class only supports mapper configuration per dataset. It
does not let you specify a partitioner and combiner as well. You will need
a custom written "high level" partitioner and combiner that can create
multiple instances of sub-partitioners/combiners and use the most likely
one based on their input's characteristics (such as instance type, some
tag, config., etc.).

On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:

>
>
>
>
> Hello
>
> 1)  I have multiple types of datasets as input to my hadoop job
>
> i want write my own inputformat (Exa. MyTableInputformat)
>   and how to specify mapper partitioner combiner per dataset manner
>  I know MultiFileInputFormat class but if i want to asscoite combiner and
> partitioner class
> it wont help. it only sets mapper class for per dataset manner.
>
> 2)  Also i am looking MapTask.java file from source code
>
> just want to know where does mapper partitioner and combiner classes are
> set for particular filesplit
> while executing job
>
> Thank You
>
> --
> *
> *
> *
>
>  Thanx and Regards*
> * Vikas Jadhav*
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

-- 
Harsh J

Re: mapper combiner and partitioner for particular dataset

Posted by Harsh J <ha...@cloudera.com>.

The MultipleInputs class only supports mapper configuration per dataset. It
does not let you specify a partitioner and combiner as well. You will need
a custom written "high level" partitioner and combiner that can create
multiple instances of sub-partitioners/combiners and use the most likely
one based on their input's characteristics (such as instance type, some
tag, config., etc.).

On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:

>
>
>
>
> Hello
>
> 1)  I have multiple types of datasets as input to my hadoop job
>
> i want write my own inputformat (Exa. MyTableInputformat)
>   and how to specify mapper partitioner combiner per dataset manner
>  I know MultiFileInputFormat class but if i want to asscoite combiner and
> partitioner class
> it wont help. it only sets mapper class for per dataset manner.
>
> 2)  Also i am looking MapTask.java file from source code
>
> just want to know where does mapper partitioner and combiner classes are
> set for particular filesplit
> while executing job
>
> Thank You
>
> --
> *
> *
> *
>
>  Thanx and Regards*
> * Vikas Jadhav*
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

-- 
Harsh J

Re: mapper combiner and partitioner for particular dataset

Posted by Harsh J <ha...@cloudera.com>.

The MultipleInputs class only supports mapper configuration per dataset. It
does not let you specify a partitioner and combiner as well. You will need
a custom written "high level" partitioner and combiner that can create
multiple instances of sub-partitioners/combiners and use the most likely
one based on their input's characteristics (such as instance type, some
tag, config., etc.).

On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:

>
>
>
>
> Hello
>
> 1)  I have multiple types of datasets as input to my hadoop job
>
> i want write my own inputformat (Exa. MyTableInputformat)
>   and how to specify mapper partitioner combiner per dataset manner
>  I know MultiFileInputFormat class but if i want to asscoite combiner and
> partitioner class
> it wont help. it only sets mapper class for per dataset manner.
>
> 2)  Also i am looking MapTask.java file from source code
>
> just want to know where does mapper partitioner and combiner classes are
> set for particular filesplit
> while executing job
>
> Thank You
>
> --
> *
> *
> *
>
>  Thanx and Regards*
> * Vikas Jadhav*
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

-- 
Harsh J

Re: mapper combiner and partitioner for particular dataset

Posted by Harsh J <ha...@cloudera.com>.

The MultipleInputs class only supports mapper configuration per dataset. It
does not let you specify a partitioner and combiner as well. You will need
a custom written "high level" partitioner and combiner that can create
multiple instances of sub-partitioners/combiners and use the most likely
one based on their input's characteristics (such as instance type, some
tag, config., etc.).

On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <vi...@gmail.com>wrote:

>
>
>
>
> Hello
>
> 1)  I have multiple types of datasets as input to my hadoop job
>
> i want write my own inputformat (Exa. MyTableInputformat)
>   and how to specify mapper partitioner combiner per dataset manner
>  I know MultiFileInputFormat class but if i want to asscoite combiner and
> partitioner class
> it wont help. it only sets mapper class for per dataset manner.
>
> 2)  Also i am looking MapTask.java file from source code
>
> just want to know where does mapper partitioner and combiner classes are
> set for particular filesplit
> while executing job
>
> Thank You
>
> --
> *
> *
> *
>
>  Thanx and Regards*
> * Vikas Jadhav*
>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>

-- 
Harsh J

Fwd: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Hello

1)  I have multiple types of datasets as input to my hadoop job

i want write my own inputformat (Exa. MyTableInputformat)
  and how to specify mapper partitioner combiner per dataset manner
 I know MultiFileInputFormat class but if i want to asscoite combiner and
partitioner class
it wont help. it only sets mapper class for per dataset manner.

2)  Also i am looking MapTask.java file from source code

just want to know where does mapper partitioner and combiner classes are
set for particular filesplit
while executing job

Thank You

-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Fwd: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Hello

1)  I have multiple types of datasets as input to my hadoop job

i want write my own inputformat (Exa. MyTableInputformat)
  and how to specify mapper partitioner combiner per dataset manner
 I know MultiFileInputFormat class but if i want to asscoite combiner and
partitioner class
it wont help. it only sets mapper class for per dataset manner.

2)  Also i am looking MapTask.java file from source code

just want to know where does mapper partitioner and combiner classes are
set for particular filesplit
while executing job

Thank You

-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Fwd: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Hello

1)  I have multiple types of datasets as input to my hadoop job

i want write my own inputformat (Exa. MyTableInputformat)
  and how to specify mapper partitioner combiner per dataset manner
 I know MultiFileInputFormat class but if i want to asscoite combiner and
partitioner class
it wont help. it only sets mapper class for per dataset manner.

2)  Also i am looking MapTask.java file from source code

just want to know where does mapper partitioner and combiner classes are
set for particular filesplit
while executing job

Thank You

-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Fwd: mapper combiner and partitioner for particular dataset

Posted by Vikas Jadhav <vi...@gmail.com>.

Hello

1)  I have multiple types of datasets as input to my hadoop job

i want write my own inputformat (Exa. MyTableInputformat)
  and how to specify mapper partitioner combiner per dataset manner
 I know MultiFileInputFormat class but if i want to asscoite combiner and
partitioner class
it wont help. it only sets mapper class for per dataset manner.

2)  Also i am looking MapTask.java file from source code

just want to know where does mapper partitioner and combiner classes are
set for particular filesplit
while executing job

Thank You

-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*