You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Dileep Kumar <di...@gmail.com> on 2013/03/08 23:09:20 UTC

Hive query started map task being killed during execution

Hi All,

I am running a hive query which does insert into a table.
What I noticed from the symptom it looks like it got to do with some
settings but  I am not able to figure out what settings.

When I submit the query it starts 2130 map tasks in the job and 150 of them
completes fine without any error and then next batch of 75 gets killed and
all of them after that gets killed.
While I submit a similar query based on smaller table its starts around
only 135 map tasks and it runs till completion without any error and does
the insert into appropriate table.

I don't find any obvious error messages in any of the tasks log apart form
this:


./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator:
DESERIALIZE_ERRORS:0
./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator:
DESERIALIZE_ERRORS:0
./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:
Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
StatsPublishing error: cannot connect to database

Please suggest if I need to set anything in Hive when I invoke this query.
The query that runs successfully has lot less rows compared to on that
fails.

Thanks,
DK

Re: Hive query started map task being killed during execution

Posted by Abdelrhman Shettia <as...@hortonworks.com>.

Hi Dileep, 

Have tried to se the following values in hive and run the query again.  More info why the query may fail in the following link : 

https://cwiki.apache.org/Hive/statsdev.html


set hive.stats.autogather=false;
As well as ; 

set hive.stats.dbclass=jdbc:derby;
set hive.stats.dbconnectionstring="jdbc:derby:;databaseName=TempStatsStore;create=true";
set hive.stats.jdbcdriver="org.apache.derby.jdbc.EmbeddedDriver";

Hope this helps. 


 

 Abdelrahman Shettia
ashettia@hortonworks.com


On Mar 8, 2013, at 2:31 PM, Dileep Kumar <di...@gmail.com> wrote:

> Thanks for your attention !
> No only one hive process is running and thing that bother me is smaller query runs till completion which I invoke the same way. It is using embeded db if that is the problem I can change it to external DB but as my smaller query runs fine I thought this should be OK.
> 
> 
> On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler <de...@thinkbiganalytics.com> wrote:
> Do you have more than one hive process running? It looks like you're using Derby, which only supports one process at a time. Also, you have to start Hive from the same directory every time, where the metastore "database" is written, unless you edit the JDBC connection property in the Hive config file to point to a particular path. Here's what I use:
> 
> <property>
>   <name>javax.jdo.option.ConnectionURL</name>
>   <value>jdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true</value>
>   <description>JDBC connect string for a JDBC metastore</description>
> </property>
> 
> 
> On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar <di...@gmail.com> wrote:
> Hi All,
> 
> I am running a hive query which does insert into a table.
> What I noticed from the symptom it looks like it got to do with some settings but  I am not able to figure out what settings.
> 
> When I submit the query it starts 2130 map tasks in the job and 150 of them completes fine without any error and then next batch of 75 gets killed and all of them after that gets killed.
> While I submit a similar query based on smaller table its starts around only 135 map tasks and it runs till completion without any error and does the insert into appropriate table.
> 
> I don't find any obvious error messages in any of the tasks log apart form this:
> 
> 
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error: cannot connect to database
> 
> Please suggest if I need to set anything in Hive when I invoke this query. The query that runs successfully has lot less rows compared to on that fails.
> 
> Thanks,
> DK
> 
> 
> 
> -- 
> Dean Wampler, Ph.D.
> thinkbiganalytics.com
> +1-312-339-1330
> 
>

Re: Hive query started map task being killed during execution

Posted by Dileep Kumar <di...@gmail.com>.

Thanks for your attention !
No only one hive process is running and thing that bother me is smaller
query runs till completion which I invoke the same way. It is using embeded
db if that is the problem I can change it to external DB but as my smaller
query runs fine I thought this should be OK.


On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler <
dean.wampler@thinkbiganalytics.com> wrote:

> Do you have more than one hive process running? It looks like you're using
> Derby, which only supports one process at a time. Also, you have to start
> Hive from the same directory every time, where the metastore "database" is
> written, unless you edit the JDBC connection property in the Hive config
> file to point to a particular path. Here's what I use:
>
> <property>
>   <name>javax.jdo.option.ConnectionURL</name>
>
> <value>jdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true</value>
>   <description>JDBC connect string for a JDBC metastore</description>
> </property>
>
>
> On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar <di...@gmail.com>wrote:
>
>> Hi All,
>>
>> I am running a hive query which does insert into a table.
>> What I noticed from the symptom it looks like it got to do with some
>> settings but  I am not able to figure out what settings.
>>
>> When I submit the query it starts 2130 map tasks in the job and 150 of
>> them completes fine without any error and then next batch of 75 gets killed
>> and all of them after that gets killed.
>> While I submit a similar query based on smaller table its starts around
>> only 135 map tasks and it runs till completion without any error and does
>> the insert into appropriate table.
>>
>> I don't find any obvious error messages in any of the tasks log apart
>> form this:
>>
>>
>> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
>> 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator:
>> DESERIALIZE_ERRORS:0
>> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
>> 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator:
>> DESERIALIZE_ERRORS:0
>> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
>> 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:
>> Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
>> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
>> 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
>> StatsPublishing error: cannot connect to database
>>
>> Please suggest if I need to set anything in Hive when I invoke this
>> query. The query that runs successfully has lot less rows compared to on
>> that fails.
>>
>> Thanks,
>> DK
>>
>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>

Re: Hive query started map task being killed during execution

Posted by Dean Wampler <de...@thinkbiganalytics.com>.

Do you have more than one hive process running? It looks like you're using
Derby, which only supports one process at a time. Also, you have to start
Hive from the same directory every time, where the metastore "database" is
written, unless you edit the JDBC connection property in the Hive config
file to point to a particular path. Here's what I use:

<property>
  <name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>


On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar <di...@gmail.com>wrote:

> Hi All,
>
> I am running a hive query which does insert into a table.
> What I noticed from the symptom it looks like it got to do with some
> settings but  I am not able to figure out what settings.
>
> When I submit the query it starts 2130 map tasks in the job and 150 of
> them completes fine without any error and then next batch of 75 gets killed
> and all of them after that gets killed.
> While I submit a similar query based on smaller table its starts around
> only 135 map tasks and it runs till completion without any error and does
> the insert into appropriate table.
>
> I don't find any obvious error messages in any of the tasks log apart form
> this:
>
>
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
> 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator:
> DESERIALIZE_ERRORS:0
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
> 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator:
> DESERIALIZE_ERRORS:0
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
> 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:
> Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
> ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
> 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
> StatsPublishing error: cannot connect to database
>
> Please suggest if I need to set anything in Hive when I invoke this query.
> The query that runs successfully has lot less rows compared to on that
> fails.
>
> Thanks,
> DK
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330