You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by air <cn...@gmail.com> on 2011/08/09 10:19:01 UTC

Fwd: CDH3 U1 Hive Job-commit very slow

---------- Forwarded message ----------
From: air <cn...@gmail.com>
Date: 2011/8/9
Subject: CDH3 U1 Hive Job-commit very slow
To: CDH Users <cd...@cloudera.org>

when I submit a ql to hive, it is a very long time until it really submit
the job to the hadoop cluster, what may cause this problem ?* *thank you for
your help.*

hive> select count(1) from log_test where src='test' and ds='2011-08-04';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>  *
<------------------------stay here for a long time..

-- 
Knowledge Mangement .

-- 
Knowledge Mangement .

Re: CDH3 U1 Hive Job-commit very slow

Posted by air <cn...@gmail.com>.

I did some test, found that it is not Hive's issue, when I submit a job
using hadoop jar it also has the same problem , so I need to find the key
point from the hadoop cluster !

2011/8/11 air <cn...@gmail.com>

> hi Aggarwal， I am using the newest version (CDH3 Update1 Hive 0.7), after
> submitting several jobs using hive, the submit becomes very slow (about 2-5
> minutes), following is some error information from hive.log (seems the
> metastore has some problem, I upgrade the metastore from 0.5 to 0.6 and then
> from 0.6 to 0.7 using the upgrade scripts of coudera..)
>
>
> 2011-08-11 09:32:34,391 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2011-08-11 09:32:33,883 WARN  mapred.JobClient
> (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for
> parsing the arguments. Applications should implement To
> ol for the same.
> 2011-08-11 09:32:34,810 ERROR metastore.HiveMetaStore
> (HiveMetaStore.java:executeWithRetry(321)) - JDO datastore error. Retrying
> metastore command after 1000 ms (attempt 1 of 1)
> 2011-08-11 09:32:35,033 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2011-08-11 09:32:35,033 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2011-08-11 09:32:35,922 ERROR parse.SemanticAnalyzer
> (SemanticAnalyzer.java:getMetaData(918)) -
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table
> model_use
> rclass
>         at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:838)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:772)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:782)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6596)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:482)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: javax.jdo.JDODataStoreException: Exception thrown obtaining
> schema column information from datastore
> NestedThrowables:
> com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table
> 'metastore.DELETEME1313026355834' doesn't exist
>         at
> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
>         at
> org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4154)
>         at
> org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411)
>         at
> org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312)
>         at
> org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225)
>         at
> org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175)
>         at org.datanucleus.store.query.Query.executeQuery(Query.java:1628)
>         at
> org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
>         at
> org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
>         at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
>         at
> org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:375)
>         at
> org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:394)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:432)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.access$200(HiveMetaStore.java:109)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$5.run(HiveMetaStore.java:454)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$5.run(HiveMetaStore.java:451)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:451)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:232)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:197)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:108)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1868)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1878)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
>         ... 14 more
> Caused by: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table
> 'metastore.DELETEME1313026355834' doesn't exist
>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:936)
>         at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985)
>         at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631)
>         at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
>         at com.mysql.jdbc.Connection.execSQL(Connection.java:3277)
>         at com.mysql.jdbc.Connection.execSQL(Connection.java:3206)
>         at com.mysql.jdbc.Statement.executeQuery(Statement.java:1232)
>         at
> com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2390)
>         at
> com.mysql.jdbc.DatabaseMetaData$IterateBlock.doForAll(DatabaseMetaData.java:76)
>         at
> com.mysql.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2264)
>         at
> org.apache.commons.dbcp.DelegatingDatabaseMetaData.getColumns(DelegatingDatabaseMetaData.java:218)
>         at
> org.datanucleus.store.rdbms.adapter.DatabaseAdapter.getColumns(DatabaseAdapter.java:1460)
>         at
> org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:924)
>         at
> org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:823)
>         at
> org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:772)
>         at
> org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getSchemaData(RDBMSSchemaHandler.java:207)
>         at
> org.datanucleus.store.rdbms.RDBMSStoreManager.getColumnInfoForTable(RDBMSStoreManager.java:1699)
>         at
> org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:218)
>         at
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2702)
>         at
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503)
>         at
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148)
>
> 2011/8/11 Aggarwal, Vaibhav <va...@amazon.com>
>
> How much time is the query startup taking?****
>>
>> ** **
>>
>> In earlier versions of Hive (before HIVE 2299) the query startup process
>> had an algorithm which took O(n^2) operations in number of partitions.***
>> *
>>
>> This means 100M operations before it would submit the map reduce job.****
>>
>> ** **
>>
>> *From:* air [mailto:cnweike@gmail.com]
>> *Sent:* Wednesday, August 10, 2011 3:40 AM
>>
>> *To:* user@hive.apache.org
>> *Subject:* Re: CDH3 U1 Hive Job-commit very slow****
>>
>> ** **
>>
>> there is only 10186 partitions in the metadata store (select count(1) from
>> PARTITIONS; in mysql), I think it is not the problem. ****
>>
>> 2011/8/10 Aggarwal, Vaibhav <va...@amazon.com>****
>>
>> Do you have a lot of partitions in your table?****
>>
>> Time taken to process the partitions before submitting the job is
>> proportional to number of partitions.****
>>
>>  ****
>>
>> There is a patch I submitted recently as an attempt to alleviate this
>> problem:****
>>
>>  ****
>>
>> https://issues.apache.org/jira/browse/HIVE-2299****
>>
>>  ****
>>
>> If that is not the case, even I would be interested in root cause of large
>> query startup time.****
>>
>>  ****
>>
>> *From:* air [mailto:cnweike@gmail.com]
>> *Sent:* Tuesday, August 09, 2011 1:19 AM
>> *To:* user@hive.apache.org
>> *Subject:* Fwd: CDH3 U1 Hive Job-commit very slow****
>>
>>  ****
>>
>>  ****
>>
>> ---------- Forwarded message ----------
>> From: *air* <cn...@gmail.com>
>> Date: 2011/8/9
>> Subject: CDH3 U1 Hive Job-commit very slow
>> To: CDH Users <cd...@cloudera.org>
>>
>>
>> when I submit a ql to hive, it is a very long time until it really submit
>> the job to the hadoop cluster, what may cause this problem ?* *thank you
>> for your help.*
>>
>> hive> select count(1) from log_test where src='test' and ds='2011-08-04';
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=<number>  *
>> <------------------------stay here for a long time..
>>
>> --
>> Knowledge Mangement .****
>>
>>
>>
>>
>> --
>> Knowledge Mangement .****
>>
>>
>>
>>
>> --
>> Knowledge Mangement .****
>>
>
>
>
> --
> Knowledge Mangement .
>
>


-- 
Knowledge Mangement .

Re: CDH3 U1 Hive Job-commit very slow

Posted by air <cn...@gmail.com>.

hi Aggarwal， I am using the newest version (CDH3 Update1 Hive 0.7), after
submitting several jobs using hive, the submit becomes very slow (about 2-5
minutes), following is some error information from hive.log (seems the
metastore has some problem, I upgrade the metastore from 0.5 to 0.6 and then
from 0.6 to 0.7 using the upgrade scripts of coudera..)


2011-08-11 09:32:34,391 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2011-08-11 09:32:33,883 WARN  mapred.JobClient
(JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for
parsing the arguments. Applications should implement To
ol for the same.
2011-08-11 09:32:34,810 ERROR metastore.HiveMetaStore
(HiveMetaStore.java:executeWithRetry(321)) - JDO datastore error. Retrying
metastore command after 1000 ms (attempt 1 of 1)
2011-08-11 09:32:35,033 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2011-08-11 09:32:35,033 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2011-08-11 09:32:35,922 ERROR parse.SemanticAnalyzer
(SemanticAnalyzer.java:getMetaData(918)) -
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table
model_use
rclass
        at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:838)
        at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:772)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:782)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6596)
        at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: javax.jdo.JDODataStoreException: Exception thrown obtaining
schema column information from datastore
NestedThrowables:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table
'metastore.DELETEME1313026355834' doesn't exist
        at
org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
        at
org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4154)
        at
org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411)
        at
org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312)
        at
org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225)
        at
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1628)
        at
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
        at
org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
        at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
        at
org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:375)
        at
org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:394)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:432)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.access$200(HiveMetaStore.java:109)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$5.run(HiveMetaStore.java:454)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$5.run(HiveMetaStore.java:451)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:451)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:232)
        at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:197)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:108)
        at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1868)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1878)
        at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
        ... 14 more
Caused by: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table
'metastore.DELETEME1313026355834' doesn't exist
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:936)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
        at com.mysql.jdbc.Connection.execSQL(Connection.java:3277)
        at com.mysql.jdbc.Connection.execSQL(Connection.java:3206)
        at com.mysql.jdbc.Statement.executeQuery(Statement.java:1232)
        at
com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2390)
        at
com.mysql.jdbc.DatabaseMetaData$IterateBlock.doForAll(DatabaseMetaData.java:76)
        at
com.mysql.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2264)
        at
org.apache.commons.dbcp.DelegatingDatabaseMetaData.getColumns(DelegatingDatabaseMetaData.java:218)
        at
org.datanucleus.store.rdbms.adapter.DatabaseAdapter.getColumns(DatabaseAdapter.java:1460)
        at
org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:924)
        at
org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:823)
        at
org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:772)
        at
org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getSchemaData(RDBMSSchemaHandler.java:207)
        at
org.datanucleus.store.rdbms.RDBMSStoreManager.getColumnInfoForTable(RDBMSStoreManager.java:1699)
        at
org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:218)
        at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2702)
        at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503)
        at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148)

2011/8/11 Aggarwal, Vaibhav <va...@amazon.com>

> How much time is the query startup taking?****
>
> ** **
>
> In earlier versions of Hive (before HIVE 2299) the query startup process
> had an algorithm which took O(n^2) operations in number of partitions.****
>
> This means 100M operations before it would submit the map reduce job.****
>
> ** **
>
> *From:* air [mailto:cnweike@gmail.com]
> *Sent:* Wednesday, August 10, 2011 3:40 AM
>
> *To:* user@hive.apache.org
> *Subject:* Re: CDH3 U1 Hive Job-commit very slow****
>
> ** **
>
> there is only 10186 partitions in the metadata store (select count(1) from
> PARTITIONS; in mysql), I think it is not the problem. ****
>
> 2011/8/10 Aggarwal, Vaibhav <va...@amazon.com>****
>
> Do you have a lot of partitions in your table?****
>
> Time taken to process the partitions before submitting the job is
> proportional to number of partitions.****
>
>  ****
>
> There is a patch I submitted recently as an attempt to alleviate this
> problem:****
>
>  ****
>
> https://issues.apache.org/jira/browse/HIVE-2299****
>
>  ****
>
> If that is not the case, even I would be interested in root cause of large
> query startup time.****
>
>  ****
>
> *From:* air [mailto:cnweike@gmail.com]
> *Sent:* Tuesday, August 09, 2011 1:19 AM
> *To:* user@hive.apache.org
> *Subject:* Fwd: CDH3 U1 Hive Job-commit very slow****
>
>  ****
>
>  ****
>
> ---------- Forwarded message ----------
> From: *air* <cn...@gmail.com>
> Date: 2011/8/9
> Subject: CDH3 U1 Hive Job-commit very slow
> To: CDH Users <cd...@cloudera.org>
>
>
> when I submit a ql to hive, it is a very long time until it really submit
> the job to the hadoop cluster, what may cause this problem ?* *thank you
> for your help.*
>
> hive> select count(1) from log_test where src='test' and ds='2011-08-04';
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>  *
> <------------------------stay here for a long time..
>
> --
> Knowledge Mangement .****
>
>
>
>
> --
> Knowledge Mangement .****
>
>
>
>
> --
> Knowledge Mangement .****
>



-- 
Knowledge Mangement .

RE: CDH3 U1 Hive Job-commit very slow

Posted by "Aggarwal, Vaibhav" <va...@amazon.com>.

How much time is the query startup taking?

In earlier versions of Hive (before HIVE 2299) the query startup process had an algorithm which took O(n^2) operations in number of partitions.
This means 100M operations before it would submit the map reduce job.

From: air [mailto:cnweike@gmail.com]
Sent: Wednesday, August 10, 2011 3:40 AM
To: user@hive.apache.org
Subject: Re: CDH3 U1 Hive Job-commit very slow

there is only 10186 partitions in the metadata store (select count(1) from PARTITIONS; in mysql), I think it is not the problem.
2011/8/10 Aggarwal, Vaibhav <va...@amazon.com>>
Do you have a lot of partitions in your table?
Time taken to process the partitions before submitting the job is proportional to number of partitions.

There is a patch I submitted recently as an attempt to alleviate this problem:

https://issues.apache.org/jira/browse/HIVE-2299

If that is not the case, even I would be interested in root cause of large query startup time.

From: air [mailto:cnweike@gmail.com<ma...@gmail.com>]
Sent: Tuesday, August 09, 2011 1:19 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Fwd: CDH3 U1 Hive Job-commit very slow

---------- Forwarded message ----------
From: air <cn...@gmail.com>>
Date: 2011/8/9
Subject: CDH3 U1 Hive Job-commit very slow
To: CDH Users <cd...@cloudera.org>>

when I submit a ql to hive, it is a very long time until it really submit the job to the hadoop cluster, what may cause this problem ? thank you for your help.

hive> select count(1) from log_test where src='test' and ds='2011-08-04';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>              <------------------------stay here for a long time..

--
Knowledge Mangement .

--
Knowledge Mangement .

--
Knowledge Mangement .

Re: CDH3 U1 Hive Job-commit very slow

Posted by air <cn...@gmail.com>.

there is only 10186 partitions in the metadata store (select count(1) from
PARTITIONS; in mysql), I think it is not the problem.

2011/8/10 Aggarwal, Vaibhav <va...@amazon.com>

> Do you have a lot of partitions in your table?****
>
> Time taken to process the partitions before submitting the job is
> proportional to number of partitions.****
>
> ** **
>
> There is a patch I submitted recently as an attempt to alleviate this
> problem:****
>
> ** **
>
> https://issues.apache.org/jira/browse/HIVE-2299****
>
> ** **
>
> If that is not the case, even I would be interested in root cause of large
> query startup time.****
>
> ** **
>
> *From:* air [mailto:cnweike@gmail.com]
> *Sent:* Tuesday, August 09, 2011 1:19 AM
> *To:* user@hive.apache.org
> *Subject:* Fwd: CDH3 U1 Hive Job-commit very slow****
>
> ** **
>
> ** **
>
> ---------- Forwarded message ----------
> From: *air* <cn...@gmail.com>
> Date: 2011/8/9
> Subject: CDH3 U1 Hive Job-commit very slow
> To: CDH Users <cd...@cloudera.org>
>
>
> when I submit a ql to hive, it is a very long time until it really submit
> the job to the hadoop cluster, what may cause this problem ?* *thank you
> for your help.*
>
> hive> select count(1) from log_test where src='test' and ds='2011-08-04';
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>  *
> <------------------------stay here for a long time..
>
> --
> Knowledge Mangement .****
>
>
>
>
> --
> Knowledge Mangement .****
>



-- 
Knowledge Mangement .

RE: CDH3 U1 Hive Job-commit very slow

Posted by "Aggarwal, Vaibhav" <va...@amazon.com>.

Do you have a lot of partitions in your table?
Time taken to process the partitions before submitting the job is proportional to number of partitions.

There is a patch I submitted recently as an attempt to alleviate this problem:

https://issues.apache.org/jira/browse/HIVE-2299

If that is not the case, even I would be interested in root cause of large query startup time.

From: air [mailto:cnweike@gmail.com]
Sent: Tuesday, August 09, 2011 1:19 AM
To: user@hive.apache.org
Subject: Fwd: CDH3 U1 Hive Job-commit very slow


---------- Forwarded message ----------
From: air <cn...@gmail.com>>
Date: 2011/8/9
Subject: CDH3 U1 Hive Job-commit very slow
To: CDH Users <cd...@cloudera.org>>


when I submit a ql to hive, it is a very long time until it really submit the job to the hadoop cluster, what may cause this problem ? thank you for your help.

hive> select count(1) from log_test where src='test' and ds='2011-08-04';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>              <------------------------stay here for a long time..

--
Knowledge Mangement .



--
Knowledge Mangement .

RE: CDH3 U1 Hive Job-commit very slow

Posted by Steven Wong <sw...@netflix.com>.

You can tail the Hive log and see what it is doing at the time.

From: air [mailto:cnweike@gmail.com]
Sent: Tuesday, August 09, 2011 1:19 AM
To: user@hive.apache.org
Subject: Fwd: CDH3 U1 Hive Job-commit very slow

---------- Forwarded message ----------
From: air <cn...@gmail.com>>
Date: 2011/8/9
Subject: CDH3 U1 Hive Job-commit very slow
To: CDH Users <cd...@cloudera.org>>

when I submit a ql to hive, it is a very long time until it really submit the job to the hadoop cluster, what may cause this problem ? thank you for your help.

hive> select count(1) from log_test where src='test' and ds='2011-08-04';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>              <------------------------stay here for a long time..

--
Knowledge Mangement .

--
Knowledge Mangement .