You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by 闫昆 <ya...@gmail.com> on 2013/08/21 11:17:17 UTC

only one mapper

hi all when i use hive
hive job make only one mapper actually my file split 18 block my block size
is 128MB and data size 2GB
i use lzo compression and create file.lzo and make index file.lzo.index
i use hive 0.10.0

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Cannot run job locally: Input Size (= 2304560827) is larger than
hive.exec.mode.local.auto.inputbytes.max (= 134217728)
Starting Job = job_1377071515613_0003, Tracking URL =
http://hydra0001:8088/proxy/application_1377071515613_0003/
Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
job_1377071515613_0003
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 0
2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81
sec
2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81
sec
2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81
sec
2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95
sec
2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95
sec
2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 13.0
sec

-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhadoop@gmail.com

Re: only one mapper

Posted by pandees waran <pa...@gmail.com>.

Hi Edward,

    
      


    Could yiu please explain this?

    
      Snappy + SequenceFile is a better option then LZO.
    

    
      
        

    

    
      Thanks,
    

    
      Pandeeswaran 
    

    
      


    
      


    —
Sent from Mailbox for iPad

On Wed, Aug 21, 2013 at 11:13 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> LZO files are only splittable if you index them. Sequence files compresses
> with LZO are splittable without being indexed.
> Snappy + SequenceFile is a better option then LZO.
> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <ig...@decide.com> wrote:
>> LZO files are combinable so check your max split setting.
>>
>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E
>>
>> igor
>> decide.com
>>
>>
>>
>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com> wrote:
>>
>>> hi all when i use hive
>>> hive job make only one mapper actually my file split 18 block my block
>>> size is 128MB and data size 2GB
>>> i use lzo compression and create file.lzo and make index file.lzo.index
>>> i use hive 0.10.0
>>>
>>> Total MapReduce jobs = 1
>>> Launching Job 1 out of 1
>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>> Starting Job = job_1377071515613_0003, Tracking URL =
>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>>> job_1377071515613_0003
>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>> reducers: 0
>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>> 9.95 sec
>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>> 9.95 sec
>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>>> 13.0 sec
>>>
>>> --
>>>
>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>> ecosystem, I hope one day I can contribute their own code
>>>
>>> YanBit
>>> yankunhadoop@gmail.com
>>>
>>>
>>

Re: only one mapper

Posted by Rajesh Balamohan <ra...@gmail.com>.

Good to hear that.


On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 <ya...@gmail.com> wrote:

> thanks all i move lzo index to hive directory is work fine .
> thanks
>
>
> 2013/8/22 Rajesh Balamohan <ra...@gmail.com>
>
>> Create the LZO index after moving the file to hive directory (i.e after
>> executing your LOAD DATA* statement).  Index file is needed only during job
>> execution and if its not present in the same directory, it would not split
>> the large file.
>>
>>
>> On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 <ya...@gmail.com> wrote:
>>
>>> In hive i use SET
>>> mapreduce.input.fileinputformat.split.maxsize=134217728; but not effect and
>>> i found when use
>>>
>>> LOAD DATA INPATH  '/data_split/data_rowkey.lzo'
>>>
>>> OVERWRITE INTO TABLE data_zh
>>>
>>> The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but issue
>>> is data_rowkey.lzo.index is also exist hdfs /data_split/ directory
>>> .actually data move to hive directory , index file in hdfs directory ,they
>>> are not in the same directory
>>>
>>>
>>> 2013/8/22 Sanjay Subramanian <Sa...@wizecommerce.com>
>>>
>>>>  Hi
>>>>
>>>>  Try this setting in your hive query
>>>>
>>>>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>>>>
>>>>  If u set this value "low" then the MR job will use this size to split
>>>> the input LZO files and u will get multiple mappers (and make sure the
>>>> input LZO files are indexed I.e. .LZO.INDEX files are created)
>>>>
>>>>  sanjay
>>>>
>>>>
>>>>   From: Edward Capriolo <ed...@gmail.com>
>>>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>>>> Date: Wednesday, August 21, 2013 10:43 AM
>>>> To: "user@hive.apache.org" <us...@hive.apache.org>
>>>> Subject: Re: only one mapper
>>>>
>>>>   LZO files are only splittable if you index them. Sequence files
>>>> compresses with LZO are splittable without being indexed.
>>>>
>>>>  Snappy + SequenceFile is a better option then LZO.
>>>>
>>>>
>>>> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <ig...@decide.com>wrote:
>>>>
>>>>>  LZO files are combinable so check your max split setting.
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E
>>>>>
>>>>>  igor
>>>>> decide.com
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com> wrote:
>>>>>
>>>>>>  hi all when i use hive
>>>>>> hive job make only one mapper actually my file split 18 block my
>>>>>> block size is 128MB and data size 2GB
>>>>>> i use lzo compression and create file.lzo and make index
>>>>>> file.lzo.index
>>>>>> i use hive 0.10.0
>>>>>>
>>>>>>  Total MapReduce jobs = 1
>>>>>> Launching Job 1 out of 1
>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>>>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>>>>> Starting Job = job_1377071515613_0003, Tracking URL =
>>>>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>>>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job
>>>>>>  -kill job_1377071515613_0003
>>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>>>> reducers: 0
>>>>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>>>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative
>>>>>> CPU 9.95 sec
>>>>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative
>>>>>> CPU 9.95 sec
>>>>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative
>>>>>> CPU 13.0 sec
>>>>>>
>>>>>>  --
>>>>>>
>>>>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>>>>> ecosystem, I hope one day I can contribute their own code
>>>>>>
>>>>>> YanBit
>>>>>> yankunhadoop@gmail.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> ======================
>>>> This email message and any attachments are for the exclusive use of the
>>>> intended recipient(s) and may contain confidential and privileged
>>>> information. Any unauthorized review, use, disclosure or distribution is
>>>> prohibited. If you are not the intended recipient, please contact the
>>>> sender by reply email and destroy all copies of the original message along
>>>> with any attachments, from your computer system. If you are the intended
>>>> recipient, please be advised that the content of this message is subject to
>>>> access, review and disclosure by the sender's Email System Administrator.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>> ecosystem, I hope one day I can contribute their own code
>>>
>>> YanBit
>>> yankunhadoop@gmail.com
>>>
>>>
>>
>>
>> --
>> ~Rajesh.B
>>
>
>
>
> --
>
> In the Hadoop world, I am just a novice, explore the entire Hadoop
> ecosystem, I hope one day I can contribute their own code
>
> YanBit
> yankunhadoop@gmail.com
>
>


-- 
~Rajesh.B

Re: only one mapper

Posted by 闫昆 <ya...@gmail.com>.

thanks all i move lzo index to hive directory is work fine .
thanks


2013/8/22 Rajesh Balamohan <ra...@gmail.com>

> Create the LZO index after moving the file to hive directory (i.e after
> executing your LOAD DATA* statement).  Index file is needed only during job
> execution and if its not present in the same directory, it would not split
> the large file.
>
>
> On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 <ya...@gmail.com> wrote:
>
>> In hive i use SET
>> mapreduce.input.fileinputformat.split.maxsize=134217728; but not effect and
>> i found when use
>>
>> LOAD DATA INPATH  '/data_split/data_rowkey.lzo'
>>
>> OVERWRITE INTO TABLE data_zh
>>
>> The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but issue
>> is data_rowkey.lzo.index is also exist hdfs /data_split/ directory
>> .actually data move to hive directory , index file in hdfs directory ,they
>> are not in the same directory
>>
>>
>> 2013/8/22 Sanjay Subramanian <Sa...@wizecommerce.com>
>>
>>>  Hi
>>>
>>>  Try this setting in your hive query
>>>
>>>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>>>
>>>  If u set this value "low" then the MR job will use this size to split
>>> the input LZO files and u will get multiple mappers (and make sure the
>>> input LZO files are indexed I.e. .LZO.INDEX files are created)
>>>
>>>  sanjay
>>>
>>>
>>>   From: Edward Capriolo <ed...@gmail.com>
>>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>>> Date: Wednesday, August 21, 2013 10:43 AM
>>> To: "user@hive.apache.org" <us...@hive.apache.org>
>>> Subject: Re: only one mapper
>>>
>>>   LZO files are only splittable if you index them. Sequence files
>>> compresses with LZO are splittable without being indexed.
>>>
>>>  Snappy + SequenceFile is a better option then LZO.
>>>
>>>
>>> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <ig...@decide.com> wrote:
>>>
>>>>  LZO files are combinable so check your max split setting.
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E
>>>>
>>>>  igor
>>>> decide.com
>>>>
>>>>
>>>>
>>>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com> wrote:
>>>>
>>>>>  hi all when i use hive
>>>>> hive job make only one mapper actually my file split 18 block my block
>>>>> size is 128MB and data size 2GB
>>>>> i use lzo compression and create file.lzo and make index
>>>>> file.lzo.index
>>>>> i use hive 0.10.0
>>>>>
>>>>>  Total MapReduce jobs = 1
>>>>> Launching Job 1 out of 1
>>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>>>> Starting Job = job_1377071515613_0003, Tracking URL =
>>>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>>>>> job_1377071515613_0003
>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>>> reducers: 0
>>>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>>> 6.81 sec
>>>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>>> 6.81 sec
>>>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>>> 6.81 sec
>>>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>>> 9.95 sec
>>>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>>> 9.95 sec
>>>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>>>>> 13.0 sec
>>>>>
>>>>>  --
>>>>>
>>>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>>>> ecosystem, I hope one day I can contribute their own code
>>>>>
>>>>> YanBit
>>>>> yankunhadoop@gmail.com
>>>>>
>>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ======================
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the
>>> sender by reply email and destroy all copies of the original message along
>>> with any attachments, from your computer system. If you are the intended
>>> recipient, please be advised that the content of this message is subject to
>>> access, review and disclosure by the sender's Email System Administrator.
>>>
>>
>>
>>
>> --
>>
>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>> ecosystem, I hope one day I can contribute their own code
>>
>> YanBit
>> yankunhadoop@gmail.com
>>
>>
>
>
> --
> ~Rajesh.B
>



-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhadoop@gmail.com

Re: only one mapper

Posted by Rajesh Balamohan <ra...@gmail.com>.

Create the LZO index after moving the file to hive directory (i.e after
executing your LOAD DATA* statement).  Index file is needed only during job
execution and if its not present in the same directory, it would not split
the large file.


On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 <ya...@gmail.com> wrote:

> In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728;
> but not effect and i found when use
>
> LOAD DATA INPATH  '/data_split/data_rowkey.lzo'
>
> OVERWRITE INTO TABLE data_zh
>
> The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but issue
> is data_rowkey.lzo.index is also exist hdfs /data_split/ directory
> .actually data move to hive directory , index file in hdfs directory ,they
> are not in the same directory
>
>
> 2013/8/22 Sanjay Subramanian <Sa...@wizecommerce.com>
>
>>  Hi
>>
>>  Try this setting in your hive query
>>
>>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>>
>>  If u set this value "low" then the MR job will use this size to split
>> the input LZO files and u will get multiple mappers (and make sure the
>> input LZO files are indexed I.e. .LZO.INDEX files are created)
>>
>>  sanjay
>>
>>
>>   From: Edward Capriolo <ed...@gmail.com>
>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>> Date: Wednesday, August 21, 2013 10:43 AM
>> To: "user@hive.apache.org" <us...@hive.apache.org>
>> Subject: Re: only one mapper
>>
>>   LZO files are only splittable if you index them. Sequence files
>> compresses with LZO are splittable without being indexed.
>>
>>  Snappy + SequenceFile is a better option then LZO.
>>
>>
>> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <ig...@decide.com> wrote:
>>
>>>  LZO files are combinable so check your max split setting.
>>>
>>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E
>>>
>>>  igor
>>> decide.com
>>>
>>>
>>>
>>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com> wrote:
>>>
>>>>  hi all when i use hive
>>>> hive job make only one mapper actually my file split 18 block my block
>>>> size is 128MB and data size 2GB
>>>> i use lzo compression and create file.lzo and make index file.lzo.index
>>>> i use hive 0.10.0
>>>>
>>>>  Total MapReduce jobs = 1
>>>> Launching Job 1 out of 1
>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>>> Starting Job = job_1377071515613_0003, Tracking URL =
>>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>>>> job_1377071515613_0003
>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>> reducers: 0
>>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>> 9.95 sec
>>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>> 9.95 sec
>>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>>>> 13.0 sec
>>>>
>>>>  --
>>>>
>>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>>> ecosystem, I hope one day I can contribute their own code
>>>>
>>>> YanBit
>>>> yankunhadoop@gmail.com
>>>>
>>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>
>
> --
>
> In the Hadoop world, I am just a novice, explore the entire Hadoop
> ecosystem, I hope one day I can contribute their own code
>
> YanBit
> yankunhadoop@gmail.com
>
>


-- 
~Rajesh.B

Re: only one mapper

Posted by 闫昆 <ya...@gmail.com>.

In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728;
but not effect and i found when use

LOAD DATA INPATH  '/data_split/data_rowkey.lzo'

OVERWRITE INTO TABLE data_zh

The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but
issue is data_rowkey.lzo.index
is also exist hdfs /data_split/ directory .actually data move to hive
directory , index file in hdfs directory ,they are not in the same directory


2013/8/22 Sanjay Subramanian <Sa...@wizecommerce.com>

>  Hi
>
>  Try this setting in your hive query
>
>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>
>  If u set this value "low" then the MR job will use this size to split
> the input LZO files and u will get multiple mappers (and make sure the
> input LZO files are indexed I.e. .LZO.INDEX files are created)
>
>  sanjay
>
>
>   From: Edward Capriolo <ed...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Wednesday, August 21, 2013 10:43 AM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Re: only one mapper
>
>   LZO files are only splittable if you index them. Sequence files
> compresses with LZO are splittable without being indexed.
>
>  Snappy + SequenceFile is a better option then LZO.
>
>
> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <ig...@decide.com> wrote:
>
>>  LZO files are combinable so check your max split setting.
>>
>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E
>>
>>  igor
>> decide.com
>>
>>
>>
>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com> wrote:
>>
>>>  hi all when i use hive
>>> hive job make only one mapper actually my file split 18 block my block
>>> size is 128MB and data size 2GB
>>> i use lzo compression and create file.lzo and make index file.lzo.index
>>> i use hive 0.10.0
>>>
>>>  Total MapReduce jobs = 1
>>> Launching Job 1 out of 1
>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>> Starting Job = job_1377071515613_0003, Tracking URL =
>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>>> job_1377071515613_0003
>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>> reducers: 0
>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>> 6.81 sec
>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>> 9.95 sec
>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>> 9.95 sec
>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>>> 13.0 sec
>>>
>>>  --
>>>
>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>> ecosystem, I hope one day I can contribute their own code
>>>
>>> YanBit
>>> yankunhadoop@gmail.com
>>>
>>>
>>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>



-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhadoop@gmail.com

Re: only one mapper

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi

Try this setting in your hive query

SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;

If u set this value "low" then the MR job will use this size to split the input LZO files and u will get multiple mappers (and make sure the input LZO files are indexed I.e. .LZO.INDEX files are created)

sanjay


From: Edward Capriolo <ed...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Wednesday, August 21, 2013 10:43 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: only one mapper

LZO files are only splittable if you index them. Sequence files compresses with LZO are splittable without being indexed.

Snappy + SequenceFile is a better option then LZO.


On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <ig...@decide.com>> wrote:
LZO files are combinable so check your max split setting.
http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E

igor
decide.com<http://decide.com>



On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com>> wrote:
hi all when i use hive
hive job make only one mapper actually my file split 18 block my block size is 128MB and data size 2GB
i use lzo compression and create file.lzo and make index file.lzo.index
i use hive 0.10.0

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Cannot run job locally: Input Size (= 2304560827) is larger than hive.exec.mode.local.auto.inputbytes.max (= 134217728)
Starting Job = job_1377071515613_0003, Tracking URL = http://hydra0001:8088/proxy/application_1377071515613_0003/
Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill job_1377071515613_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95 sec
2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95 sec
2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 13.0 sec

--

In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code

YanBit
yankunhadoop@gmail.com<ma...@gmail.com>




CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: only one mapper

Posted by Edward Capriolo <ed...@gmail.com>.

LZO files are only splittable if you index them. Sequence files compresses
with LZO are splittable without being indexed.

Snappy + SequenceFile is a better option then LZO.


On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <ig...@decide.com> wrote:

> LZO files are combinable so check your max split setting.
>
> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E
>
> igor
> decide.com
>
>
>
> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com> wrote:
>
>> hi all when i use hive
>> hive job make only one mapper actually my file split 18 block my block
>> size is 128MB and data size 2GB
>> i use lzo compression and create file.lzo and make index file.lzo.index
>> i use hive 0.10.0
>>
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Cannot run job locally: Input Size (= 2304560827) is larger than
>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>> Starting Job = job_1377071515613_0003, Tracking URL =
>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>> job_1377071515613_0003
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 0
>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>> 6.81 sec
>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>> 6.81 sec
>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>> 6.81 sec
>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>> 9.95 sec
>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>> 9.95 sec
>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>> 13.0 sec
>>
>> --
>>
>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>> ecosystem, I hope one day I can contribute their own code
>>
>> YanBit
>> yankunhadoop@gmail.com
>>
>>
>

Re: only one mapper

Posted by Igor Tatarinov <ig...@decide.com>.

LZO files are combinable so check your max split setting.
http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E

igor
decide.com



On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <ya...@gmail.com> wrote:

> hi all when i use hive
> hive job make only one mapper actually my file split 18 block my block
> size is 128MB and data size 2GB
> i use lzo compression and create file.lzo and make index file.lzo.index
> i use hive 0.10.0
>
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Cannot run job locally: Input Size (= 2304560827) is larger than
> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
> Starting Job = job_1377071515613_0003, Tracking URL =
> http://hydra0001:8088/proxy/application_1377071515613_0003/
> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
> job_1377071515613_0003
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 0
> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> 6.81 sec
> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> 6.81 sec
> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> 6.81 sec
> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
> 9.95 sec
> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
> 9.95 sec
> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
> 13.0 sec
>
> --
>
> In the Hadoop world, I am just a novice, explore the entire Hadoop
> ecosystem, I hope one day I can contribute their own code
>
> YanBit
> yankunhadoop@gmail.com
>
>