You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Tim Sell <tr...@gmail.com> on 2010/02/23 20:00:25 UTC

Hive jobs only run with 1 map task

We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
same hive package kept working against the new hadoop setup.

Since the upgrade every hive starts with only 1 map task though. Even
after setting it with eg: set mapred.map.tasks=32;
We recompiled our hive setup against hadoop 0.20 and still get the same issue.

Any suggestions for something obvious we might have missed?

~Tim.

Re: Hive jobs only run with 1 map task

Posted by Tim Sell <tr...@gmail.com>.

It's fixed.
We didn't figure out caused it, but we seem to have fixed it by
upgrading to the latest cloudera version of hive.

thanks

On 24 February 2010 11:25, Tim Sell <tr...@gmail.com> wrote:
> Hi again,
>
> mapred.min.split.size=0
> dfs.block.size=134217728
>
>
>
> On 23 February 2010 21:54, Namit Jain <nj...@facebook.com> wrote:
>> Can you check the parameters: mapred.min.split.size and dfs.block.size ?
>>
>> -----Original Message-----
>> From: Tim Sell [mailto:trsell@gmail.com]
>> Sent: Tuesday, February 23, 2010 11:26 AM
>> To: hive-user@hadoop.apache.org
>> Subject: Re: Hive jobs only run with 1 map task
>>
>> It happens on a table that is a single 30 gig tab separated file.
>> It also happens on tables that are split over a hundreds files.
>>
>>
>> On 23 February 2010 19:20, Namit Jain <nj...@facebook.com> wrote:
>>> What is the size of the input data for the query ?
>>>
>>> Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.
>>>
>>>
>>>
>>> -namit
>>>
>>> -----Original Message-----
>>> From: Tim Sell [mailto:trsell@gmail.com]
>>> Sent: Tuesday, February 23, 2010 11:14 AM
>>> To: hive-user@hadoop.apache.org
>>> Subject: Re: Hive jobs only run with 1 map task
>>>
>>> If it helps looking at the job conf in the map reduce logs I noticed
>>> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>>>
>>>
>>> On 23 February 2010 19:11, Tim Sell <tr...@gmail.com> wrote:
>>>> Is hive.input.format set on the table? I'm not sure how to pull that
>>>> out again. I know they are stored as text though.
>>>> I should mention they do actually parse/process correctly.
>>>>
>>>> Here are all the set parameters
>>>>
>>>> hive> set;
>>>> silent=off
>>>> javax.jdo.option.ConnectionUserName=hive
>>>> hive.exec.reducers.bytes.per.reducer=100000000
>>>> hive.mapred.local.mem=0
>>>> datanucleus.autoStartMechanismMode=checked
>>>> hive.metastore.connect.retries=5
>>>> datanucleus.validateColumns=false
>>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
>>>> datanucleus.autoCreateSchema=true
>>>> javax.jdo.option.ConnectionPassword=hive
>>>> datanucleus.validateConstraints=false
>>>> datancucleus.transactionIsolation=read-committed
>>>> datanucleus.validateTables=false
>>>> hive.map.aggr.hash.min.reduction=0.5
>>>> datanucleus.storeManagerType=rdbms
>>>> hive.exec.script.maxerrsize=100000
>>>> hive.merge.size.per.task=256000000
>>>> hive.test.mode.prefix=test_
>>>> hive.groupby.skewindata=false
>>>> hive.default.fileformat=TextFile
>>>> hive.script.auto.progress=false
>>>> hive.groupby.mapaggr.checkinterval=100000
>>>> hive.hwi.listen.port=9999
>>>> datanuclues.cache.level2=true
>>>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
>>>> hive.merge.mapfiles=true
>>>> hive.exec.compress.output=false
>>>> datanuclues.cache.level2.type=SOFT
>>>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>>>> hive.map.aggr=true
>>>> hive.join.emit.interval=1000
>>>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
>>>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
>>>> hive.mapred.mode=nonstrict
>>>> hive.exec.scratchdir=/tmp/hive-${user.name}
>>>> javax.jdo.option.NonTransactionalRead=true
>>>> hive.metastore.local=true
>>>> hive.test.mode.samplefreq=32
>>>> hive.test.mode=false
>>>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
>>>> javax.jdo.option.DetachAllOnCommit=true
>>>> hive.heartbeat.interval=1000
>>>> hive.map.aggr.hash.percentmemory=0.5
>>>> hive.exec.reducers.max=107
>>>> hive.hwi.listen.host=0.0.0.0
>>>> hive.exec.compress.intermediate=false
>>>> hive.optimize.cp=true
>>>> hive.optimize.ppd=true
>>>> hive.session.id=tims_201002231907
>>>> hive.merge.mapredfiles=false
>>>>
>>>> ~Tim.
>>>>
>>>> On 23 February 2010 19:03, Namit Jain <nj...@facebook.com> wrote:
>>>>> Can you check your input format ?
>>>>>
>>>>> Can you check the value of the parameter :
>>>>> hive.input.format ?
>>>>>
>>>>> Can you send all the parameters ?
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> -namit
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Tim Sell [mailto:trsell@gmail.com]
>>>>> Sent: Tuesday, February 23, 2010 11:00 AM
>>>>> To: hive-user@hadoop.apache.org
>>>>> Subject: Hive jobs only run with 1 map task
>>>>>
>>>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>>>>> same hive package kept working against the new hadoop setup.
>>>>>
>>>>> Since the upgrade every hive starts with only 1 map task though. Even
>>>>> after setting it with eg: set mapred.map.tasks=32;
>>>>> We recompiled our hive setup against hadoop 0.20 and still get the same issue.
>>>>>
>>>>> Any suggestions for something obvious we might have missed?
>>>>>
>>>>> ~Tim.
>>>>>
>>>>
>>>
>>
>

Re: Hive jobs only run with 1 map task

Posted by Tim Sell <tr...@gmail.com>.

Hi again,

mapred.min.split.size=0
dfs.block.size=134217728



On 23 February 2010 21:54, Namit Jain <nj...@facebook.com> wrote:
> Can you check the parameters: mapred.min.split.size and dfs.block.size ?
>
> -----Original Message-----
> From: Tim Sell [mailto:trsell@gmail.com]
> Sent: Tuesday, February 23, 2010 11:26 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Hive jobs only run with 1 map task
>
> It happens on a table that is a single 30 gig tab separated file.
> It also happens on tables that are split over a hundreds files.
>
>
> On 23 February 2010 19:20, Namit Jain <nj...@facebook.com> wrote:
>> What is the size of the input data for the query ?
>>
>> Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.
>>
>>
>>
>> -namit
>>
>> -----Original Message-----
>> From: Tim Sell [mailto:trsell@gmail.com]
>> Sent: Tuesday, February 23, 2010 11:14 AM
>> To: hive-user@hadoop.apache.org
>> Subject: Re: Hive jobs only run with 1 map task
>>
>> If it helps looking at the job conf in the map reduce logs I noticed
>> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>>
>>
>> On 23 February 2010 19:11, Tim Sell <tr...@gmail.com> wrote:
>>> Is hive.input.format set on the table? I'm not sure how to pull that
>>> out again. I know they are stored as text though.
>>> I should mention they do actually parse/process correctly.
>>>
>>> Here are all the set parameters
>>>
>>> hive> set;
>>> silent=off
>>> javax.jdo.option.ConnectionUserName=hive
>>> hive.exec.reducers.bytes.per.reducer=100000000
>>> hive.mapred.local.mem=0
>>> datanucleus.autoStartMechanismMode=checked
>>> hive.metastore.connect.retries=5
>>> datanucleus.validateColumns=false
>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
>>> datanucleus.autoCreateSchema=true
>>> javax.jdo.option.ConnectionPassword=hive
>>> datanucleus.validateConstraints=false
>>> datancucleus.transactionIsolation=read-committed
>>> datanucleus.validateTables=false
>>> hive.map.aggr.hash.min.reduction=0.5
>>> datanucleus.storeManagerType=rdbms
>>> hive.exec.script.maxerrsize=100000
>>> hive.merge.size.per.task=256000000
>>> hive.test.mode.prefix=test_
>>> hive.groupby.skewindata=false
>>> hive.default.fileformat=TextFile
>>> hive.script.auto.progress=false
>>> hive.groupby.mapaggr.checkinterval=100000
>>> hive.hwi.listen.port=9999
>>> datanuclues.cache.level2=true
>>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
>>> hive.merge.mapfiles=true
>>> hive.exec.compress.output=false
>>> datanuclues.cache.level2.type=SOFT
>>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>>> hive.map.aggr=true
>>> hive.join.emit.interval=1000
>>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
>>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
>>> hive.mapred.mode=nonstrict
>>> hive.exec.scratchdir=/tmp/hive-${user.name}
>>> javax.jdo.option.NonTransactionalRead=true
>>> hive.metastore.local=true
>>> hive.test.mode.samplefreq=32
>>> hive.test.mode=false
>>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
>>> javax.jdo.option.DetachAllOnCommit=true
>>> hive.heartbeat.interval=1000
>>> hive.map.aggr.hash.percentmemory=0.5
>>> hive.exec.reducers.max=107
>>> hive.hwi.listen.host=0.0.0.0
>>> hive.exec.compress.intermediate=false
>>> hive.optimize.cp=true
>>> hive.optimize.ppd=true
>>> hive.session.id=tims_201002231907
>>> hive.merge.mapredfiles=false
>>>
>>> ~Tim.
>>>
>>> On 23 February 2010 19:03, Namit Jain <nj...@facebook.com> wrote:
>>>> Can you check your input format ?
>>>>
>>>> Can you check the value of the parameter :
>>>> hive.input.format ?
>>>>
>>>> Can you send all the parameters ?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> -namit
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Tim Sell [mailto:trsell@gmail.com]
>>>> Sent: Tuesday, February 23, 2010 11:00 AM
>>>> To: hive-user@hadoop.apache.org
>>>> Subject: Hive jobs only run with 1 map task
>>>>
>>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>>>> same hive package kept working against the new hadoop setup.
>>>>
>>>> Since the upgrade every hive starts with only 1 map task though. Even
>>>> after setting it with eg: set mapred.map.tasks=32;
>>>> We recompiled our hive setup against hadoop 0.20 and still get the same issue.
>>>>
>>>> Any suggestions for something obvious we might have missed?
>>>>
>>>> ~Tim.
>>>>
>>>
>>
>

RE: Hive jobs only run with 1 map task

Posted by Namit Jain <nj...@facebook.com>.

Can you check the parameters: mapred.min.split.size and dfs.block.size ?

-----Original Message-----
From: Tim Sell [mailto:trsell@gmail.com] 
Sent: Tuesday, February 23, 2010 11:26 AM
To: hive-user@hadoop.apache.org
Subject: Re: Hive jobs only run with 1 map task

It happens on a table that is a single 30 gig tab separated file.
It also happens on tables that are split over a hundreds files.


On 23 February 2010 19:20, Namit Jain <nj...@facebook.com> wrote:
> What is the size of the input data for the query ?
>
> Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.
>
>
>
> -namit
>
> -----Original Message-----
> From: Tim Sell [mailto:trsell@gmail.com]
> Sent: Tuesday, February 23, 2010 11:14 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Hive jobs only run with 1 map task
>
> If it helps looking at the job conf in the map reduce logs I noticed
> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>
>
> On 23 February 2010 19:11, Tim Sell <tr...@gmail.com> wrote:
>> Is hive.input.format set on the table? I'm not sure how to pull that
>> out again. I know they are stored as text though.
>> I should mention they do actually parse/process correctly.
>>
>> Here are all the set parameters
>>
>> hive> set;
>> silent=off
>> javax.jdo.option.ConnectionUserName=hive
>> hive.exec.reducers.bytes.per.reducer=100000000
>> hive.mapred.local.mem=0
>> datanucleus.autoStartMechanismMode=checked
>> hive.metastore.connect.retries=5
>> datanucleus.validateColumns=false
>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
>> datanucleus.autoCreateSchema=true
>> javax.jdo.option.ConnectionPassword=hive
>> datanucleus.validateConstraints=false
>> datancucleus.transactionIsolation=read-committed
>> datanucleus.validateTables=false
>> hive.map.aggr.hash.min.reduction=0.5
>> datanucleus.storeManagerType=rdbms
>> hive.exec.script.maxerrsize=100000
>> hive.merge.size.per.task=256000000
>> hive.test.mode.prefix=test_
>> hive.groupby.skewindata=false
>> hive.default.fileformat=TextFile
>> hive.script.auto.progress=false
>> hive.groupby.mapaggr.checkinterval=100000
>> hive.hwi.listen.port=9999
>> datanuclues.cache.level2=true
>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
>> hive.merge.mapfiles=true
>> hive.exec.compress.output=false
>> datanuclues.cache.level2.type=SOFT
>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>> hive.map.aggr=true
>> hive.join.emit.interval=1000
>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
>> hive.mapred.mode=nonstrict
>> hive.exec.scratchdir=/tmp/hive-${user.name}
>> javax.jdo.option.NonTransactionalRead=true
>> hive.metastore.local=true
>> hive.test.mode.samplefreq=32
>> hive.test.mode=false
>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
>> javax.jdo.option.DetachAllOnCommit=true
>> hive.heartbeat.interval=1000
>> hive.map.aggr.hash.percentmemory=0.5
>> hive.exec.reducers.max=107
>> hive.hwi.listen.host=0.0.0.0
>> hive.exec.compress.intermediate=false
>> hive.optimize.cp=true
>> hive.optimize.ppd=true
>> hive.session.id=tims_201002231907
>> hive.merge.mapredfiles=false
>>
>> ~Tim.
>>
>> On 23 February 2010 19:03, Namit Jain <nj...@facebook.com> wrote:
>>> Can you check your input format ?
>>>
>>> Can you check the value of the parameter :
>>> hive.input.format ?
>>>
>>> Can you send all the parameters ?
>>>
>>>
>>>
>>> Thanks,
>>> -namit
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Tim Sell [mailto:trsell@gmail.com]
>>> Sent: Tuesday, February 23, 2010 11:00 AM
>>> To: hive-user@hadoop.apache.org
>>> Subject: Hive jobs only run with 1 map task
>>>
>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>>> same hive package kept working against the new hadoop setup.
>>>
>>> Since the upgrade every hive starts with only 1 map task though. Even
>>> after setting it with eg: set mapred.map.tasks=32;
>>> We recompiled our hive setup against hadoop 0.20 and still get the same issue.
>>>
>>> Any suggestions for something obvious we might have missed?
>>>
>>> ~Tim.
>>>
>>
>

Re: Hive jobs only run with 1 map task

Posted by Tim Sell <tr...@gmail.com>.

It happens on a table that is a single 30 gig tab separated file.
It also happens on tables that are split over a hundreds files.


On 23 February 2010 19:20, Namit Jain <nj...@facebook.com> wrote:
> What is the size of the input data for the query ?
>
> Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.
>
>
>
> -namit
>
> -----Original Message-----
> From: Tim Sell [mailto:trsell@gmail.com]
> Sent: Tuesday, February 23, 2010 11:14 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Hive jobs only run with 1 map task
>
> If it helps looking at the job conf in the map reduce logs I noticed
> mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>
>
> On 23 February 2010 19:11, Tim Sell <tr...@gmail.com> wrote:
>> Is hive.input.format set on the table? I'm not sure how to pull that
>> out again. I know they are stored as text though.
>> I should mention they do actually parse/process correctly.
>>
>> Here are all the set parameters
>>
>> hive> set;
>> silent=off
>> javax.jdo.option.ConnectionUserName=hive
>> hive.exec.reducers.bytes.per.reducer=100000000
>> hive.mapred.local.mem=0
>> datanucleus.autoStartMechanismMode=checked
>> hive.metastore.connect.retries=5
>> datanucleus.validateColumns=false
>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
>> datanucleus.autoCreateSchema=true
>> javax.jdo.option.ConnectionPassword=hive
>> datanucleus.validateConstraints=false
>> datancucleus.transactionIsolation=read-committed
>> datanucleus.validateTables=false
>> hive.map.aggr.hash.min.reduction=0.5
>> datanucleus.storeManagerType=rdbms
>> hive.exec.script.maxerrsize=100000
>> hive.merge.size.per.task=256000000
>> hive.test.mode.prefix=test_
>> hive.groupby.skewindata=false
>> hive.default.fileformat=TextFile
>> hive.script.auto.progress=false
>> hive.groupby.mapaggr.checkinterval=100000
>> hive.hwi.listen.port=9999
>> datanuclues.cache.level2=true
>> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
>> hive.merge.mapfiles=true
>> hive.exec.compress.output=false
>> datanuclues.cache.level2.type=SOFT
>> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>> hive.map.aggr=true
>> hive.join.emit.interval=1000
>> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
>> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
>> hive.mapred.mode=nonstrict
>> hive.exec.scratchdir=/tmp/hive-${user.name}
>> javax.jdo.option.NonTransactionalRead=true
>> hive.metastore.local=true
>> hive.test.mode.samplefreq=32
>> hive.test.mode=false
>> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
>> javax.jdo.option.DetachAllOnCommit=true
>> hive.heartbeat.interval=1000
>> hive.map.aggr.hash.percentmemory=0.5
>> hive.exec.reducers.max=107
>> hive.hwi.listen.host=0.0.0.0
>> hive.exec.compress.intermediate=false
>> hive.optimize.cp=true
>> hive.optimize.ppd=true
>> hive.session.id=tims_201002231907
>> hive.merge.mapredfiles=false
>>
>> ~Tim.
>>
>> On 23 February 2010 19:03, Namit Jain <nj...@facebook.com> wrote:
>>> Can you check your input format ?
>>>
>>> Can you check the value of the parameter :
>>> hive.input.format ?
>>>
>>> Can you send all the parameters ?
>>>
>>>
>>>
>>> Thanks,
>>> -namit
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Tim Sell [mailto:trsell@gmail.com]
>>> Sent: Tuesday, February 23, 2010 11:00 AM
>>> To: hive-user@hadoop.apache.org
>>> Subject: Hive jobs only run with 1 map task
>>>
>>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>>> same hive package kept working against the new hadoop setup.
>>>
>>> Since the upgrade every hive starts with only 1 map task though. Even
>>> after setting it with eg: set mapred.map.tasks=32;
>>> We recompiled our hive setup against hadoop 0.20 and still get the same issue.
>>>
>>> Any suggestions for something obvious we might have missed?
>>>
>>> ~Tim.
>>>
>>
>

RE: Hive jobs only run with 1 map task

Posted by Namit Jain <nj...@facebook.com>.

What is the size of the input data for the query ?

Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.



-namit

-----Original Message-----
From: Tim Sell [mailto:trsell@gmail.com] 
Sent: Tuesday, February 23, 2010 11:14 AM
To: hive-user@hadoop.apache.org
Subject: Re: Hive jobs only run with 1 map task

If it helps looking at the job conf in the map reduce logs I noticed
mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat


On 23 February 2010 19:11, Tim Sell <tr...@gmail.com> wrote:
> Is hive.input.format set on the table? I'm not sure how to pull that
> out again. I know they are stored as text though.
> I should mention they do actually parse/process correctly.
>
> Here are all the set parameters
>
> hive> set;
> silent=off
> javax.jdo.option.ConnectionUserName=hive
> hive.exec.reducers.bytes.per.reducer=100000000
> hive.mapred.local.mem=0
> datanucleus.autoStartMechanismMode=checked
> hive.metastore.connect.retries=5
> datanucleus.validateColumns=false
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
> datanucleus.autoCreateSchema=true
> javax.jdo.option.ConnectionPassword=hive
> datanucleus.validateConstraints=false
> datancucleus.transactionIsolation=read-committed
> datanucleus.validateTables=false
> hive.map.aggr.hash.min.reduction=0.5
> datanucleus.storeManagerType=rdbms
> hive.exec.script.maxerrsize=100000
> hive.merge.size.per.task=256000000
> hive.test.mode.prefix=test_
> hive.groupby.skewindata=false
> hive.default.fileformat=TextFile
> hive.script.auto.progress=false
> hive.groupby.mapaggr.checkinterval=100000
> hive.hwi.listen.port=9999
> datanuclues.cache.level2=true
> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
> hive.merge.mapfiles=true
> hive.exec.compress.output=false
> datanuclues.cache.level2.type=SOFT
> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
> hive.map.aggr=true
> hive.join.emit.interval=1000
> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
> hive.mapred.mode=nonstrict
> hive.exec.scratchdir=/tmp/hive-${user.name}
> javax.jdo.option.NonTransactionalRead=true
> hive.metastore.local=true
> hive.test.mode.samplefreq=32
> hive.test.mode=false
> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
> javax.jdo.option.DetachAllOnCommit=true
> hive.heartbeat.interval=1000
> hive.map.aggr.hash.percentmemory=0.5
> hive.exec.reducers.max=107
> hive.hwi.listen.host=0.0.0.0
> hive.exec.compress.intermediate=false
> hive.optimize.cp=true
> hive.optimize.ppd=true
> hive.session.id=tims_201002231907
> hive.merge.mapredfiles=false
>
> ~Tim.
>
> On 23 February 2010 19:03, Namit Jain <nj...@facebook.com> wrote:
>> Can you check your input format ?
>>
>> Can you check the value of the parameter :
>> hive.input.format ?
>>
>> Can you send all the parameters ?
>>
>>
>>
>> Thanks,
>> -namit
>>
>>
>>
>> -----Original Message-----
>> From: Tim Sell [mailto:trsell@gmail.com]
>> Sent: Tuesday, February 23, 2010 11:00 AM
>> To: hive-user@hadoop.apache.org
>> Subject: Hive jobs only run with 1 map task
>>
>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>> same hive package kept working against the new hadoop setup.
>>
>> Since the upgrade every hive starts with only 1 map task though. Even
>> after setting it with eg: set mapred.map.tasks=32;
>> We recompiled our hive setup against hadoop 0.20 and still get the same issue.
>>
>> Any suggestions for something obvious we might have missed?
>>
>> ~Tim.
>>
>

Re: Hive jobs only run with 1 map task

Posted by Tim Sell <tr...@gmail.com>.

If it helps looking at the job conf in the map reduce logs I noticed
mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat


On 23 February 2010 19:11, Tim Sell <tr...@gmail.com> wrote:
> Is hive.input.format set on the table? I'm not sure how to pull that
> out again. I know they are stored as text though.
> I should mention they do actually parse/process correctly.
>
> Here are all the set parameters
>
> hive> set;
> silent=off
> javax.jdo.option.ConnectionUserName=hive
> hive.exec.reducers.bytes.per.reducer=100000000
> hive.mapred.local.mem=0
> datanucleus.autoStartMechanismMode=checked
> hive.metastore.connect.retries=5
> datanucleus.validateColumns=false
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
> datanucleus.autoCreateSchema=true
> javax.jdo.option.ConnectionPassword=hive
> datanucleus.validateConstraints=false
> datancucleus.transactionIsolation=read-committed
> datanucleus.validateTables=false
> hive.map.aggr.hash.min.reduction=0.5
> datanucleus.storeManagerType=rdbms
> hive.exec.script.maxerrsize=100000
> hive.merge.size.per.task=256000000
> hive.test.mode.prefix=test_
> hive.groupby.skewindata=false
> hive.default.fileformat=TextFile
> hive.script.auto.progress=false
> hive.groupby.mapaggr.checkinterval=100000
> hive.hwi.listen.port=9999
> datanuclues.cache.level2=true
> hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
> hive.merge.mapfiles=true
> hive.exec.compress.output=false
> datanuclues.cache.level2.type=SOFT
> javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
> hive.map.aggr=true
> hive.join.emit.interval=1000
> hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
> javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
> hive.mapred.mode=nonstrict
> hive.exec.scratchdir=/tmp/hive-${user.name}
> javax.jdo.option.NonTransactionalRead=true
> hive.metastore.local=true
> hive.test.mode.samplefreq=32
> hive.test.mode=false
> javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
> javax.jdo.option.DetachAllOnCommit=true
> hive.heartbeat.interval=1000
> hive.map.aggr.hash.percentmemory=0.5
> hive.exec.reducers.max=107
> hive.hwi.listen.host=0.0.0.0
> hive.exec.compress.intermediate=false
> hive.optimize.cp=true
> hive.optimize.ppd=true
> hive.session.id=tims_201002231907
> hive.merge.mapredfiles=false
>
> ~Tim.
>
> On 23 February 2010 19:03, Namit Jain <nj...@facebook.com> wrote:
>> Can you check your input format ?
>>
>> Can you check the value of the parameter :
>> hive.input.format ?
>>
>> Can you send all the parameters ?
>>
>>
>>
>> Thanks,
>> -namit
>>
>>
>>
>> -----Original Message-----
>> From: Tim Sell [mailto:trsell@gmail.com]
>> Sent: Tuesday, February 23, 2010 11:00 AM
>> To: hive-user@hadoop.apache.org
>> Subject: Hive jobs only run with 1 map task
>>
>> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
>> same hive package kept working against the new hadoop setup.
>>
>> Since the upgrade every hive starts with only 1 map task though. Even
>> after setting it with eg: set mapred.map.tasks=32;
>> We recompiled our hive setup against hadoop 0.20 and still get the same issue.
>>
>> Any suggestions for something obvious we might have missed?
>>
>> ~Tim.
>>
>

Re: Hive jobs only run with 1 map task

Posted by Tim Sell <tr...@gmail.com>.

Is hive.input.format set on the table? I'm not sure how to pull that
out again. I know they are stored as text though.
I should mention they do actually parse/process correctly.

Here are all the set parameters

hive> set;
silent=off
javax.jdo.option.ConnectionUserName=hive
hive.exec.reducers.bytes.per.reducer=100000000
hive.mapred.local.mem=0
datanucleus.autoStartMechanismMode=checked
hive.metastore.connect.retries=5
datanucleus.validateColumns=false
hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
datanucleus.autoCreateSchema=true
javax.jdo.option.ConnectionPassword=hive
datanucleus.validateConstraints=false
datancucleus.transactionIsolation=read-committed
datanucleus.validateTables=false
hive.map.aggr.hash.min.reduction=0.5
datanucleus.storeManagerType=rdbms
hive.exec.script.maxerrsize=100000
hive.merge.size.per.task=256000000
hive.test.mode.prefix=test_
hive.groupby.skewindata=false
hive.default.fileformat=TextFile
hive.script.auto.progress=false
hive.groupby.mapaggr.checkinterval=100000
hive.hwi.listen.port=9999
datanuclues.cache.level2=true
hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
hive.merge.mapfiles=true
hive.exec.compress.output=false
datanuclues.cache.level2.type=SOFT
javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
hive.map.aggr=true
hive.join.emit.interval=1000
hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
hive.mapred.mode=nonstrict
hive.exec.scratchdir=/tmp/hive-${user.name}
javax.jdo.option.NonTransactionalRead=true
hive.metastore.local=true
hive.test.mode.samplefreq=32
hive.test.mode=false
javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
javax.jdo.option.DetachAllOnCommit=true
hive.heartbeat.interval=1000
hive.map.aggr.hash.percentmemory=0.5
hive.exec.reducers.max=107
hive.hwi.listen.host=0.0.0.0
hive.exec.compress.intermediate=false
hive.optimize.cp=true
hive.optimize.ppd=true
hive.session.id=tims_201002231907
hive.merge.mapredfiles=false

~Tim.

On 23 February 2010 19:03, Namit Jain <nj...@facebook.com> wrote:
> Can you check your input format ?
>
> Can you check the value of the parameter :
> hive.input.format ?
>
> Can you send all the parameters ?
>
>
>
> Thanks,
> -namit
>
>
>
> -----Original Message-----
> From: Tim Sell [mailto:trsell@gmail.com]
> Sent: Tuesday, February 23, 2010 11:00 AM
> To: hive-user@hadoop.apache.org
> Subject: Hive jobs only run with 1 map task
>
> We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
> same hive package kept working against the new hadoop setup.
>
> Since the upgrade every hive starts with only 1 map task though. Even
> after setting it with eg: set mapred.map.tasks=32;
> We recompiled our hive setup against hadoop 0.20 and still get the same issue.
>
> Any suggestions for something obvious we might have missed?
>
> ~Tim.
>

RE: Hive jobs only run with 1 map task

Posted by Namit Jain <nj...@facebook.com>.

Can you check your input format ?

Can you check the value of the parameter :
hive.input.format ?

Can you send all the parameters ?



Thanks,
-namit



-----Original Message-----
From: Tim Sell [mailto:trsell@gmail.com] 
Sent: Tuesday, February 23, 2010 11:00 AM
To: hive-user@hadoop.apache.org
Subject: Hive jobs only run with 1 map task

We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
same hive package kept working against the new hadoop setup.

Since the upgrade every hive starts with only 1 map task though. Even
after setting it with eg: set mapred.map.tasks=32;
We recompiled our hive setup against hadoop 0.20 and still get the same issue.

Any suggestions for something obvious we might have missed?

~Tim.