You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Siddharth Tiwari <si...@live.com> on 2014/03/03 00:12:47 UTC

FW: Query hangs at 99.97 % for one reducer in Hive

Forwarding message to hadoop list as well for any help. Appreciate any help

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"


From: siddharth.tiwari@live.com
To: user@hive.apache.org
Subject: Query hangs at 99.97 % for one reducer in Hive
Date: Sun, 2 Mar 2014 23:09:25 +0000




Hi team,
following query hangs at 99.97% for one reducer, kindly help or point to what can be cause
drop table if exists sample.dpi_short_lt;create table sample.dpi_short_lt asselect                   b.msisdn,                a.area_erb,                a.longitude,                a.latitude,                               substring(b.msisdn,1,2) as country,                               substring(b.msisdn,3,2) as area_code,                               substring(b.start_time,1,4) as year,                               substring(b.start_time,6,2) as month,                               substring(b.start_time,9,2) as day,                               substring(b.start_time,12,2) as hour,                               cast(b.procedure_duration as double) as duracao_ms,                               cast(b.internet_latency as double) as int_internet_latency,                               cast(b.ran_latency as double) as int_ran_latency,                               cast(b.http_latency as double) as int_http_latency,                               (case when b.internet_latency='' then 1 else 0 end) as internet_latency_missing,                               (case when b.ran_latency='' then 1 else 0 end) as ran_latency_missing,                               (case when b.http_latency='' then 1 else 0 end) as http_latency_missing,                               (cast(b.mean_throughput_ul as int) * cast( procedure_duration as int) / 1000) as total_up_bytes,                               (cast(b.mean_throughput_dl as int) * cast(procedure_duration as int)  / 1000) as total_dl_bytes,                               cast(b.missing_packets_ul as int) as int_missing_packets_ul,                               cast(b.missing_packets_dl as int) as int_missing_packets_dlfrom sample.dpi_large bleft outer join sample.science_new aon b.cgi = regexp_replace(a.codigo_cgi_ecgi,'-','')where msisdn!='';
Hive was heuristically selecting 1000 reducers and it was hanging at 99.97 percent on one reduce task. I then changed the above values to 3GB per reducer and 500 reducers and started hitting this error.
java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.000003_0 to: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/000003_0
	at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812

I have 22 node cluster running cdh 4.3. Please try to locate what can be teh issue.

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"
 		 	   		   		 	   		  

Re: Query hangs at 99.97 % for one reducer in Hive

Posted by "Subramanian, Sanjay (HQP)" <sa...@roberthalf.com>.
First of , I want that cluster specs :-)

I am gong to propose a few theories without proof here


  *   From the info u have provided, there are no partitions….that might be an issue creating this problem
  *   Is it possible to partition the data based on say days (to start with) and run this Left Outer Join on one day of data
  *   This specific error is saying the JOIN if failing for that one row – tough to say from here which row is problematic
     *   Have u seen the MapReduce / YARN logs to see if they have more information
        *   If u r using YARN then on HDFS there is a kind of dir on HDFS that looks like this /var/log/hadoop-yarn/….and u will have applicationxxxXXXX directory and all logs for that job under this

In summary , if u have not already done it , u should run this query on a smaller subset and creating partitions by say date might be beneficial for this volume of data

Thanks
Warm Regards

Sanjay
linkedin:http://www.linkedin.com/in/subramaniansanjay

From: Siddharth Tiwari <si...@live.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Sunday, March 2, 2014 at 8:48 PM
To: hive user list <us...@hive.apache.org>>, "sanjaysubramanian@yahoo.com<ma...@yahoo.com>" <sa...@yahoo.com>>
Subject: RE: Query hangs at 99.97 % for one reducer in Hive

Hi Sanjay,
here is teh detail


Even 500 reducers sounds a high number but I don't know the deatils of your cluster. Can u
provide some details
How many nodes in cluster : 21 Nodes
Hive version : Hive-0.10.x
Which distribution (Hortonworks, Apache, CDH, Amazon): CDH 4.3
Node specs: Each node 64 cores ( with HT), 128 GB RAM, 3*10 TB disks
Partitions in the table: none
Number of records: around 17 Billion

Also,
I see following exception thrown by hanging task, I have no idea what does this mean


java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"joinkey0":""},"value":{"_col2":"92","_col11":"-60-01-21,00","_col12":"-03-07-04,00"},"alias":1}
        at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"joinkey0":""},"value":{"_col2":"92","_col11":"-60-01-21,00","_col12":"-03-07-04,00"},"alias":1}
        at org.apache.hadoop.hive.ql.exec.ExecRedu

*------------------------*
Cheers !!!
SiddharthTiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"


________________________________
From: siddharth.tiwari@live.com<ma...@live.com>
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: FW: Query hangs at 99.97 % for one reducer in Hive
Date: Sun, 2 Mar 2014 23:12:47 +0000

Forwarding message to hadoop list as well for any help. Appreciate any help


*------------------------*
Cheers !!!
SiddharthTiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"


________________________________
From: siddharth.tiwari@live.com<ma...@live.com>
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Query hangs at 99.97 % for one reducer in Hive
Date: Sun, 2 Mar 2014 23:09:25 +0000

Hi team,

following query hangs at 99.97% for one reducer, kindly help or point to what can be cause

drop table if exists sample.dpi_short_lt;
create table sample.dpi_short_lt as
select                   b.msisdn,
                a.area_erb,
                a.longitude,
                a.latitude,
                               substring(b.msisdn,1,2) as country,
                               substring(b.msisdn,3,2) as area_code,
                               substring(b.start_time,1,4) as year,
                               substring(b.start_time,6,2) as month,
                               substring(b.start_time,9,2) as day,
                               substring(b.start_time,12,2) as hour,
                               cast(b.procedure_duration as double) as duracao_ms,
                               cast(b.internet_latency as double) as int_internet_latency,
                               cast(b.ran_latency as double) as int_ran_latency,
                               cast(b.http_latency as double) as int_http_latency,
                               (case when b.internet_latency='' then 1 else 0 end) as internet_latency_missing,
                               (case when b.ran_latency='' then 1 else 0 end) as ran_latency_missing,
                               (case when b.http_latency='' then 1 else 0 end) as http_latency_missing,
                               (cast(b.mean_throughput_ul as int) * cast( procedure_duration as int) / 1000) as total_up_bytes,
                               (cast(b.mean_throughput_dl as int) * cast(procedure_duration as int)  / 1000) as total_dl_bytes,
                               cast(b.missing_packets_ul as int) as int_missing_packets_ul,
                               cast(b.missing_packets_dl as int) as int_missing_packets_dl
from sample.dpi_large b
left outer join sample.science_new a
on b.cgi = regexp_replace(a.codigo_cgi_ecgi,'-','')
where msisdn!='';

Hive was heuristically selecting 1000 reducers and it was hanging at 99.97 percent on one reduce task. I then changed the above values to 3GB per reducer and 500 reducers and started hitting this error.


java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.000003_0 to: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/000003_0
        at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812


I have 22 node cluster running cdh 4.3. Please try to locate what can be teh issue.


*------------------------*
Cheers !!!
SiddharthTiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"

RE: Query hangs at 99.97 % for one reducer in Hive

Posted by Siddharth Tiwari <si...@live.com>.
Hi Sanjay,here is teh detail
Even 500 reducers sounds a high number but I don't know the deatils of your cluster. Can u
provide some details
How many nodes in cluster : 21 Nodes
Hive version : Hive-0.10.x
Which distribution (Hortonworks, Apache, CDH, Amazon): CDH 4.3
Node specs: Each node 64 cores ( with HT), 128 GB RAM, 3*10 TB disks
Partitions in the table: none 
Number of records: around 17 Billion
Also,I see following exception thrown by hanging task, I have no idea what does this mean
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"joinkey0":""},"value":{"_col2":"92","_col11":"-60-01-21,00","_col12":"-03-07-04,00"},"alias":1}
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"joinkey0":""},"value":{"_col2":"92","_col11":"-60-01-21,00","_col12":"-03-07-04,00"},"alias":1}
	at org.apache.hadoop.hive.ql.exec.ExecRedu
*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"


From: siddharth.tiwari@live.com
To: user@hadoop.apache.org
Subject: FW: Query hangs at 99.97 % for one reducer in Hive
Date: Sun, 2 Mar 2014 23:12:47 +0000




Forwarding message to hadoop list as well for any help. Appreciate any help

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"


From: siddharth.tiwari@live.com
To: user@hive.apache.org
Subject: Query hangs at 99.97 % for one reducer in Hive
Date: Sun, 2 Mar 2014 23:09:25 +0000




Hi team,
following query hangs at 99.97% for one reducer, kindly help or point to what can be cause
drop table if exists sample.dpi_short_lt;create table sample.dpi_short_lt asselect                   b.msisdn,                a.area_erb,                a.longitude,                a.latitude,                               substring(b.msisdn,1,2) as country,                               substring(b.msisdn,3,2) as area_code,                               substring(b.start_time,1,4) as year,                               substring(b.start_time,6,2) as month,                               substring(b.start_time,9,2) as day,                               substring(b.start_time,12,2) as hour,                               cast(b.procedure_duration as double) as duracao_ms,                               cast(b.internet_latency as double) as int_internet_latency,                               cast(b.ran_latency as double) as int_ran_latency,                               cast(b.http_latency as double) as int_http_latency,                               (case when b.internet_latency='' then 1 else 0 end) as internet_latency_missing,                               (case when b.ran_latency='' then 1 else 0 end) as ran_latency_missing,                               (case when b.http_latency='' then 1 else 0 end) as http_latency_missing,                               (cast(b.mean_throughput_ul as int) * cast( procedure_duration as int) / 1000) as total_up_bytes,                               (cast(b.mean_throughput_dl as int) * cast(procedure_duration as int)  / 1000) as total_dl_bytes,                               cast(b.missing_packets_ul as int) as int_missing_packets_ul,                               cast(b.missing_packets_dl as int) as int_missing_packets_dlfrom sample.dpi_large bleft outer join sample.science_new aon b.cgi = regexp_replace(a.codigo_cgi_ecgi,'-','')where msisdn!='';
Hive was heuristically selecting 1000 reducers and it was hanging at 99.97 percent on one reduce task. I then changed the above values to 3GB per reducer and 500 reducers and started hitting this error.
java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.000003_0 to: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/000003_0
	at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812

I have 22 node cluster running cdh 4.3. Please try to locate what can be teh issue.

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"