You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Mohammad Tariq <do...@gmail.com> on 2012/06/28 20:47:05 UTC

Hive mapper creation

Hello list,

         Since Hive tables are assumed to be of text input format, is
it right to assume that a mapper is created per row of a particular
table??Please correct me if my understanding is wrong. Also let me
know how mappers are created corresponding to a Hive query. Many
thanks.

Regards,
    Mohammad Tariq

Re: Hive mapper creation

Posted by Mohammad Tariq <do...@gmail.com>.

Ok Bejoy. I'll proceed as directed by you and get back to you in case
of any difficulty. Thanks again for the help.

Regards,
    Mohammad Tariq


On Fri, Jun 29, 2012 at 12:59 AM, Bejoy KS <be...@yahoo.com> wrote:
>  Hi Mohammed
>
> If it is to control the split size and there by the number of map tasks, you just need to play with min and max split size properties.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <do...@gmail.com>
> Date: Fri, 29 Jun 2012 00:55:54
> To: <us...@hive.apache.org>; <be...@yahoo.com>
> Reply-To: user@hive.apache.org
> Subject: Re: Hive mapper creation
>
> Thanks a lot for the valuable response Bejoy. Actually I wanted to
> know if it is possible to set the size of filesplits or the criterion
> on which filesplits are created (in turn controlling the creation of
> mappers) for a Hive query. For example, If I want to take 'n' lines
> from a file as one split instead of taking each individual row, I can
> use nlineinput format.Is it possible to do something similar at Hive's
> level or do I need to look into the source code??
>
> Regards,
>     Mohammad Tariq
>
>
> On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <be...@yahoo.com> wrote:
>> Hi Mohammed
>>
>> Splits are associated with MapReduce framework and not necessarily with hive. It is the data processed by a mapper. Based on your InputFormat, min and max split size properties MR framework considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat is used.) This choice of which all hdfs blocks forms a split is determined under the consideration of data locality. Number of mappers/map tasks created by a job is equal to the number of splits thus determined. ie one map task per split.
>>
>> Hope it is clear. Feel free to revert if you still have any queries.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Mohammad Tariq <do...@gmail.com>
>> Date: Fri, 29 Jun 2012 00:29:13
>> To: <us...@hive.apache.org>; <be...@yahoo.com>
>> Reply-To: user@hive.apache.org
>> Subject: Re: Hive mapper creation
>>
>> Hello Nitin, Bejoy,
>>
>>        Thanks a lot for the quick response. Could you please tell me
>> what is the default criterion of split creation??How the splits for a
>> Hive query are created??(Pardon my ignorance).
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <be...@yahoo.com> wrote:
>>> Hi Mohammed
>>>
>>> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>>>
>>> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>>>
>>>
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from handheld, please excuse typos.
>>>
>>> -----Original Message-----
>>> From: Mohammad Tariq <do...@gmail.com>
>>> Date: Fri, 29 Jun 2012 00:17:05
>>> To: user<us...@hive.apache.org>
>>> Reply-To: user@hive.apache.org
>>> Subject: Hive mapper creation
>>>
>>> Hello list,
>>>
>>>         Since Hive tables are assumed to be of text input format, is
>>> it right to assume that a mapper is created per row of a particular
>>> table??Please correct me if my understanding is wrong. Also let me
>>> know how mappers are created corresponding to a Hive query. Many
>>> thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq

Re: Hive mapper creation

Posted by Bejoy KS <be...@yahoo.com>.

 Hi Mohammed

If it is to control the split size and there by the number of map tasks, you just need to play with min and max split size properties.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Mohammad Tariq <do...@gmail.com>
Date: Fri, 29 Jun 2012 00:55:54 
To: <us...@hive.apache.org>; <be...@yahoo.com>
Reply-To: user@hive.apache.org
Subject: Re: Hive mapper creation

Thanks a lot for the valuable response Bejoy. Actually I wanted to
know if it is possible to set the size of filesplits or the criterion
on which filesplits are created (in turn controlling the creation of
mappers) for a Hive query. For example, If I want to take 'n' lines
from a file as one split instead of taking each individual row, I can
use nlineinput format.Is it possible to do something similar at Hive's
level or do I need to look into the source code??

Regards,
    Mohammad Tariq


On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <be...@yahoo.com> wrote:
> Hi Mohammed
>
> Splits are associated with MapReduce framework and not necessarily with hive. It is the data processed by a mapper. Based on your InputFormat, min and max split size properties MR framework considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat is used.) This choice of which all hdfs blocks forms a split is determined under the consideration of data locality. Number of mappers/map tasks created by a job is equal to the number of splits thus determined. ie one map task per split.
>
> Hope it is clear. Feel free to revert if you still have any queries.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <do...@gmail.com>
> Date: Fri, 29 Jun 2012 00:29:13
> To: <us...@hive.apache.org>; <be...@yahoo.com>
> Reply-To: user@hive.apache.org
> Subject: Re: Hive mapper creation
>
> Hello Nitin, Bejoy,
>
>        Thanks a lot for the quick response. Could you please tell me
> what is the default criterion of split creation??How the splits for a
> Hive query are created??(Pardon my ignorance).
>
> Regards,
>     Mohammad Tariq
>
>
> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <be...@yahoo.com> wrote:
>> Hi Mohammed
>>
>> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>>
>> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Mohammad Tariq <do...@gmail.com>
>> Date: Fri, 29 Jun 2012 00:17:05
>> To: user<us...@hive.apache.org>
>> Reply-To: user@hive.apache.org
>> Subject: Hive mapper creation
>>
>> Hello list,
>>
>>         Since Hive tables are assumed to be of text input format, is
>> it right to assume that a mapper is created per row of a particular
>> table??Please correct me if my understanding is wrong. Also let me
>> know how mappers are created corresponding to a Hive query. Many
>> thanks.
>>
>> Regards,
>>     Mohammad Tariq

Re: Hive mapper creation

Posted by Mohammad Tariq <do...@gmail.com>.

Thanks a lot for the valuable response Bejoy. Actually I wanted to
know if it is possible to set the size of filesplits or the criterion
on which filesplits are created (in turn controlling the creation of
mappers) for a Hive query. For example, If I want to take 'n' lines
from a file as one split instead of taking each individual row, I can
use nlineinput format.Is it possible to do something similar at Hive's
level or do I need to look into the source code??

Regards,
    Mohammad Tariq


On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <be...@yahoo.com> wrote:
> Hi Mohammed
>
> Splits are associated with MapReduce framework and not necessarily with hive. It is the data processed by a mapper. Based on your InputFormat, min and max split size properties MR framework considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat is used.) This choice of which all hdfs blocks forms a split is determined under the consideration of data locality. Number of mappers/map tasks created by a job is equal to the number of splits thus determined. ie one map task per split.
>
> Hope it is clear. Feel free to revert if you still have any queries.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <do...@gmail.com>
> Date: Fri, 29 Jun 2012 00:29:13
> To: <us...@hive.apache.org>; <be...@yahoo.com>
> Reply-To: user@hive.apache.org
> Subject: Re: Hive mapper creation
>
> Hello Nitin, Bejoy,
>
>        Thanks a lot for the quick response. Could you please tell me
> what is the default criterion of split creation??How the splits for a
> Hive query are created??(Pardon my ignorance).
>
> Regards,
>     Mohammad Tariq
>
>
> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <be...@yahoo.com> wrote:
>> Hi Mohammed
>>
>> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>>
>> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Mohammad Tariq <do...@gmail.com>
>> Date: Fri, 29 Jun 2012 00:17:05
>> To: user<us...@hive.apache.org>
>> Reply-To: user@hive.apache.org
>> Subject: Hive mapper creation
>>
>> Hello list,
>>
>>         Since Hive tables are assumed to be of text input format, is
>> it right to assume that a mapper is created per row of a particular
>> table??Please correct me if my understanding is wrong. Also let me
>> know how mappers are created corresponding to a Hive query. Many
>> thanks.
>>
>> Regards,
>>     Mohammad Tariq

Re: Hive mapper creation

Posted by Bejoy KS <be...@yahoo.com>.

Hi Mohammed

Splits are associated with MapReduce framework and not necessarily with hive. It is the data processed by a mapper. Based on your InputFormat, min and max split size properties MR framework considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat is used.) This choice of which all hdfs blocks forms a split is determined under the consideration of data locality. Number of mappers/map tasks created by a job is equal to the number of splits thus determined. ie one map task per split.

Hope it is clear. Feel free to revert if you still have any queries.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Mohammad Tariq <do...@gmail.com>
Date: Fri, 29 Jun 2012 00:29:13 
To: <us...@hive.apache.org>; <be...@yahoo.com>
Reply-To: user@hive.apache.org
Subject: Re: Hive mapper creation

Hello Nitin, Bejoy,

        Thanks a lot for the quick response. Could you please tell me
what is the default criterion of split creation??How the splits for a
Hive query are created??(Pardon my ignorance).

Regards,
    Mohammad Tariq

On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <be...@yahoo.com> wrote:
> Hi Mohammed
>
> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>
> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <do...@gmail.com>
> Date: Fri, 29 Jun 2012 00:17:05
> To: user<us...@hive.apache.org>
> Reply-To: user@hive.apache.org
> Subject: Hive mapper creation
>
> Hello list,
>
>         Since Hive tables are assumed to be of text input format, is
> it right to assume that a mapper is created per row of a particular
> table??Please correct me if my understanding is wrong. Also let me
> know how mappers are created corresponding to a Hive query. Many
> thanks.
>
> Regards,
>     Mohammad Tariq

Re: Hive mapper creation

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Nitin, Bejoy,

        Thanks a lot for the quick response. Could you please tell me
what is the default criterion of split creation??How the splits for a
Hive query are created??(Pardon my ignorance).

Regards,
    Mohammad Tariq


On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <be...@yahoo.com> wrote:
> Hi Mohammed
>
> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.
>
> You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <do...@gmail.com>
> Date: Fri, 29 Jun 2012 00:17:05
> To: user<us...@hive.apache.org>
> Reply-To: user@hive.apache.org
> Subject: Hive mapper creation
>
> Hello list,
>
>         Since Hive tables are assumed to be of text input format, is
> it right to assume that a mapper is created per row of a particular
> table??Please correct me if my understanding is wrong. Also let me
> know how mappers are created corresponding to a Hive query. Many
> thanks.
>
> Regards,
>     Mohammad Tariq

Re: Hive mapper creation

Posted by Bejoy KS <be...@yahoo.com>.

Hi Mohammed

Internally In hive the processing is done using MapReduce. So like in mapreduce the splits are calculated on job submission and a mapper is assigned per split. So a mapper ideally process a split and not a row.

You can store data in various formats as text, sequence files, RC files etc. No restriction just on text files.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Mohammad Tariq <do...@gmail.com>
Date: Fri, 29 Jun 2012 00:17:05 
To: user<us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Hive mapper creation

Hello list,

         Since Hive tables are assumed to be of text input format, is
it right to assume that a mapper is created per row of a particular
table??Please correct me if my understanding is wrong. Also let me
know how mappers are created corresponding to a Hive query. Many
thanks.

Regards,
    Mohammad Tariq

Re: Hive mapper creation

Posted by Nitin Pawar <ni...@gmail.com>.

mappers are not created per row
but instead they are based on your query and hive configuration
if u set max input size etc
you can actually set max number of mappers you want to set as well to limit
how many maps are launched

On Fri, Jun 29, 2012 at 12:17 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
>
>         Since Hive tables are assumed to be of text input format, is
> it right to assume that a mapper is created per row of a particular
> table??Please correct me if my understanding is wrong. Also let me
> know how mappers are created corresponding to a Hive query. Many
> thanks.
>
> Regards,
>     Mohammad Tariq
>



-- 
Nitin Pawar