You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by "Avdeev V. M." <ls...@list.ru> on 2012/05/22 12:20:28 UTC

from-insert-select trouble

Hello!

I'm very new to the world Hadoop and Hive so I cannot solve a problem that I encountered.

Hadoop has been deployed on a single-node in a pseudo-distributed mode.
I'm trying to copy data from one table to another. Source table created by Sqoop, destination table created by query



create table if not exists rev0.operation_list (
	id bigint,
	id_paper bigint,
	
	lgot_code int,
	id_region int,
	id_tarif_type int,
	id_annulate int,
	id_from int,
	id_to int,
	id_train int,
	id_emitent int,
	id_carriage int,
	id_place int,
	id_ticket_type int,

	sell_date string,
	trip_date string,

	amount int,
	cash int,
	ticket_count int,
	price_tarif_place int,
	price_tarif_transfer int,
	km float,
	passengers int,
	pkm float)
PARTITIONED BY(id_sell_date string)
stored as RCFILE;



Source table contains about 23 000 000 rows. When I try to execute 



set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

from rev0.operation_list_temp
insert overwrite table rev0.operation_list PARTITION(id_sell_date)
select
	id,
	id_paper,
	lgot_code,
	id_region,
	id_tarif_type,
	id_annulate,
	id_from,
	id_to,
	id_train,
	id_emitent,
	id_carriage,
	id_place,
	id_ticket_type,

	sell_date,
	trip_date,

	amount,
	cash,
	ticket_count,
	price_tarif_place,
	price_tarif_transfer,
	km,
	passengers,
	pkm,

	to_date(sell_date) id_sell_date;



I see strange progress report:



Hive history file=/tmp/user/hive_job_log_user_201205221419_1856534995.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205191141_0110, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201205191141_0110
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201205191141_0110
2012-05-22 14:19:59,092 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:00,000 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:46,527 Stage-1 map = 13%,  reduce = 0%
2012-05-22 14:21:52,664 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:22:53,357 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:23:06,747 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:23:28,409 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:24:29,322 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:25:28,276 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:25:31,397 Stage-1 map = 50%,  reduce = 0% <-- my comment: 88% downs to 50%!
2012-05-22 14:26:32,332 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:27:02,701 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:03,314 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:21,919 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:29:22,023 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:22,081 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:32,182 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:30:34,227 Stage-1 map = 50%,  reduce = 0% <-- my comment: again!
2012-05-22 14:31:34,948 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:32:01,198 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:01,904 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:20,150 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:34:21,127 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:22,018 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:33,295 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:35:43,137 Stage-1 map = 50%,  reduce = 0% <-- my comment: and again!
2012-05-22 14:36:44,057 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:37:17,486 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:18,116 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:36,327 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:39:36,936 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:37,660 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:41,731 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:40:43,759 Stage-1 map = 50%,  reduce = 0% <-- my comment: last one!
2012-05-22 14:40:47,815 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201205191141_0110 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask



I can not understand why the process was completed with an error and why the progress of the MAP procedure is so strange.

I have found 2 'workarounds':
1) divide original query for two by adding 'WHERE to_date(sell_date) < to_date(border_date)' and 'WHERE to_date(sell_date) >= to_date(border_date)'. As a result each query contains 11 500 000 rows and the copying process is completed without errors.
2) In other hand, change 'stored by rcfile' to 'stored by sequencefile' without WHERE predicate. Again, the query will complete without errors.
I have no idea about this behavior. Maybe I have not enough knowledge to understand the differencies between RCFile and SequenceFile, so can someone explain to me my mistakes?

PS. Sorry for bad English. Natural languages are my Achilles' heel.

Vyacheslav

Re: Re[2]: from-insert-select trouble

Posted by Bejoy KS <be...@yahoo.com>.

Great, good catch.. Not enough child heap size available to process your data volume. If you have free memory available just increase child.opts memory and it may pass through as well.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Avdeev V. M. <ls...@list.ru>
Date: Tue, 22 May 2012 16:15:17 
To: Bejoy Ks<be...@yahoo.com>
Reply-To: user@hive.apache.org
Cc: user@hive.apache.org<us...@hive.apache.org>
Subject: Re[2]: from-insert-select trouble

Found.

2012-05-22 17:52:47,117 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2790)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3733)
 at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
 at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
 at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.write(RCFile.java:450)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:867)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.close(RCFile.java:884)
 at org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.close(RCFileOutputFormat.java:147)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:196)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:653)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)

I will experiment with smaller data set!
Thank you Bejoy! 

Tue, 22 May 2012 03:40:20 -0700 (PDT) от Bejoy Ks <be...@yahoo.com>:
 
 
 
Hi Vyacheslav
       Can you post in the error log from your failed mapreduce tasks? You can get the error logs from the Jobtracker web UI drilling down to task level. Those errors will give you abetter understanding on what could be going wrong here.

Regards
Bejoy

 
 
 
  
----------------------------------------------------------------------
 From: Avdeev V. M. <ls...@list.ru>
 To: user@hive.apache.org 
 Sent: Tuesday, May 22, 2012 3:50 PM
 Subject: from-insert-select trouble
   
Hello!

I'm very new to the world Hadoop and Hive so I cannot solve a problem that I encountered.

Hadoop has been deployed on a single-node in a pseudo-distributed mode.
I'm trying to copy data from one table to another. Source table created by Sqoop, destination table created by query



create table if not exists rev0.operation_list (
    id bigint,
    id_paper bigint,
    
    lgot_code int,
    id_region int,
    id_tarif_type int,
    id_annulate int,
    id_from int,
    id_to int,
    id_train int,
    id_emitent int,
    id_carriage int,
    id_place int,
    id_ticket_type int,

    sell_date string,
    trip_date string,

    amount int,
    cash int,
    ticket_count int,
    price_tarif_place int,
    price_tarif_transfer int,
    km float,
    passengers int,
    pkm float)
PARTITIONED BY(id_sell_date string)
stored as RCFILE;



Source table contains about 23 000 000 rows. When I try to execute 



set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

from rev0.operation_list_temp
insert overwrite table rev0.operation_list PARTITION(id_sell_date)
select
    id,
    id_paper,
    lgot_code,
    id_region,
    id_tarif_type,
    id_annulate,
    id_from,
    id_to,
    id_train,
    id_emitent,
    id_carriage,
    id_place,
    id_ticket_type,

    sell_date,
    trip_date,

    amount,
    cash,
    ticket_count,
    price_tarif_place,
    price_tarif_transfer,
    km,
    passengers,
    pkm,

    to_date(sell_date) id_sell_date;



I see strange progress report:



Hive history file=/tmp/user/hive_job_log_user_201205221419_1856534995.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205191141_0110, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201205191141_0110
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201205191141_0110
2012-05-22 14:19:59,092 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:00,000 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:46,527 Stage-1 map = 13%,  reduce = 0%
2012-05-22 14:21:52,664 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:22:53,357 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:23:06,747 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:23:28,409 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:24:29,322 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:25:28,276 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:25:31,397 Stage-1 map = 50%,  reduce = 0% <-- my comment: 88% downs to 50%!
2012-05-22 14:26:32,332 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:27:02,701 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:03,314 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:21,919 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:29:22,023 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:22,081 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:32,182 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:30:34,227 Stage-1 map = 50%,  reduce = 0% <-- my comment: again!
2012-05-22 14:31:34,948 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:32:01,198 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:01,904 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:20,150 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:34:21,127 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:22,018 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:33,295 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:35:43,137 Stage-1 map = 50%,  reduce = 0% <-- my comment: and again!
2012-05-22 14:36:44,057 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:37:17,486 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:18,116 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:36,327 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:39:36,936 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:37,660 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:41,731 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:40:43,759 Stage-1 map = 50%,  reduce = 0% <-- my comment: last one!
2012-05-22 14:40:47,815 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201205191141_0110 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask



I can not understand why the process was completed with an error and why the progress of the MAP procedure is so strange.

I have found 2 'workarounds':
1) divide original query for two by adding 'WHERE to_date(sell_date) < to_date(border_date)' and 'WHERE to_date(sell_date) >= to_date(border_date)'. As a result each query contains 11 500 000 rows and the copying process is completed without errors.
2) In other hand, change 'stored by rcfile' to 'stored by sequencefile' without WHERE predicate. Again, the query will complete without errors.
I have no idea about this behavior. Maybe I have not enough knowledge to understand the differencies between RCFile and SequenceFile, so can someone explain to me my mistakes?

PS. Sorry for bad English. Natural languages are my Achilles' heel.

Vyacheslav

Re[2]: from-insert-select trouble

Posted by "Avdeev V. M." <ls...@list.ru>.

Found.

2012-05-22 17:52:47,117 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2790)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3733)
 at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
 at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
 at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.write(RCFile.java:450)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:867)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.close(RCFile.java:884)
 at org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.close(RCFileOutputFormat.java:147)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:196)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:653)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)

I will experiment with smaller data set!
Thank you Bejoy! 

Tue, 22 May 2012 03:40:20 -0700 (PDT) от Bejoy Ks <be...@yahoo.com>:
 
 
 
Hi Vyacheslav
       Can you post in the error log from your failed mapreduce tasks? You can get the error logs from the Jobtracker web UI drilling down to task level. Those errors will give you abetter understanding on what could be going wrong here.

Regards
Bejoy

 
 
 
  
----------------------------------------------------------------------
 From: Avdeev V. M. <ls...@list.ru>
 To: user@hive.apache.org 
 Sent: Tuesday, May 22, 2012 3:50 PM
 Subject: from-insert-select trouble
   
Hello!

I'm very new to the world Hadoop and Hive so I cannot solve a problem that I encountered.

Hadoop has been deployed on a single-node in a pseudo-distributed mode.
I'm trying to copy data from one table to another. Source table created by Sqoop, destination table created by query



create table if not exists rev0.operation_list (
    id bigint,
    id_paper bigint,
    
    lgot_code int,
    id_region int,
    id_tarif_type int,
    id_annulate int,
    id_from int,
    id_to int,
    id_train int,
    id_emitent int,
    id_carriage int,
    id_place int,
    id_ticket_type int,

    sell_date string,
    trip_date string,

    amount int,
    cash int,
    ticket_count int,
    price_tarif_place int,
    price_tarif_transfer int,
    km float,
    passengers int,
    pkm float)
PARTITIONED BY(id_sell_date string)
stored as RCFILE;



Source table contains about 23 000 000 rows. When I try to execute 



set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

from rev0.operation_list_temp
insert overwrite table rev0.operation_list PARTITION(id_sell_date)
select
    id,
    id_paper,
    lgot_code,
    id_region,
    id_tarif_type,
    id_annulate,
    id_from,
    id_to,
    id_train,
    id_emitent,
    id_carriage,
    id_place,
    id_ticket_type,

    sell_date,
    trip_date,

    amount,
    cash,
    ticket_count,
    price_tarif_place,
    price_tarif_transfer,
    km,
    passengers,
    pkm,

    to_date(sell_date) id_sell_date;



I see strange progress report:



Hive history file=/tmp/user/hive_job_log_user_201205221419_1856534995.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205191141_0110, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201205191141_0110
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201205191141_0110
2012-05-22 14:19:59,092 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:00,000 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:46,527 Stage-1 map = 13%,  reduce = 0%
2012-05-22 14:21:52,664 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:22:53,357 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:23:06,747 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:23:28,409 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:24:29,322 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:25:28,276 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:25:31,397 Stage-1 map = 50%,  reduce = 0% <-- my comment: 88% downs to 50%!
2012-05-22 14:26:32,332 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:27:02,701 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:03,314 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:21,919 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:29:22,023 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:22,081 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:32,182 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:30:34,227 Stage-1 map = 50%,  reduce = 0% <-- my comment: again!
2012-05-22 14:31:34,948 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:32:01,198 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:01,904 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:20,150 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:34:21,127 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:22,018 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:33,295 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:35:43,137 Stage-1 map = 50%,  reduce = 0% <-- my comment: and again!
2012-05-22 14:36:44,057 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:37:17,486 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:18,116 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:36,327 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:39:36,936 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:37,660 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:41,731 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:40:43,759 Stage-1 map = 50%,  reduce = 0% <-- my comment: last one!
2012-05-22 14:40:47,815 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201205191141_0110 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask



I can not understand why the process was completed with an error and why the progress of the MAP procedure is so strange.

I have found 2 'workarounds':
1) divide original query for two by adding 'WHERE to_date(sell_date) < to_date(border_date)' and 'WHERE to_date(sell_date) >= to_date(border_date)'. As a result each query contains 11 500 000 rows and the copying process is completed without errors.
2) In other hand, change 'stored by rcfile' to 'stored by sequencefile' without WHERE predicate. Again, the query will complete without errors.
I have no idea about this behavior. Maybe I have not enough knowledge to understand the differencies between RCFile and SequenceFile, so can someone explain to me my mistakes?

PS. Sorry for bad English. Natural languages are my Achilles' heel.

Vyacheslav

Re: from-insert-select trouble

Posted by Bejoy Ks <be...@yahoo.com>.

Hi Vyacheslav
       Can you post in the error log from your failed mapreduce tasks? You can get the error logs from the Jobtracker web UI drilling down to task level. Those errors will give you abetter understanding on what could be going wrong here.

Regards
Bejoy


________________________________
 From: Avdeev V. M. <ls...@list.ru>
To: user@hive.apache.org 
Sent: Tuesday, May 22, 2012 3:50 PM
Subject: from-insert-select trouble
 
Hello!

I'm very new to the world Hadoop and Hive so I cannot solve a problem that I encountered.

Hadoop has been deployed on a single-node in a pseudo-distributed mode.
I'm trying to copy data from one table to another. Source table created by Sqoop, destination table created by query



create table if not exists rev0.operation_list (
    id bigint,
    id_paper bigint,
    
    lgot_code int,
    id_region int,
    id_tarif_type int,
    id_annulate int,
    id_from int,
    id_to int,
    id_train int,
    id_emitent int,
    id_carriage int,
    id_place int,
    id_ticket_type int,

    sell_date string,
    trip_date string,

    amount int,
    cash int,
    ticket_count int,
    price_tarif_place int,
    price_tarif_transfer int,
    km float,
    passengers int,
    pkm float)
PARTITIONED BY(id_sell_date string)
stored as RCFILE;



Source table contains about 23 000 000 rows. When I try to execute 



set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

from rev0.operation_list_temp
insert overwrite table rev0.operation_list PARTITION(id_sell_date)
select
    id,
    id_paper,
    lgot_code,
    id_region,
    id_tarif_type,
    id_annulate,
    id_from,
    id_to,
    id_train,
    id_emitent,
    id_carriage,
    id_place,
    id_ticket_type,

    sell_date,
    trip_date,

    amount,
    cash,
    ticket_count,
    price_tarif_place,
    price_tarif_transfer,
    km,
    passengers,
    pkm,

    to_date(sell_date) id_sell_date;



I see strange progress report:



Hive history file=/tmp/user/hive_job_log_user_201205221419_1856534995.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205191141_0110, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201205191141_0110
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=localhost:8021 -kill job_201205191141_0110
2012-05-22 14:19:59,092 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:00,000 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:46,527 Stage-1 map = 13%,  reduce = 0%
2012-05-22 14:21:52,664 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:22:53,357 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:23:06,747 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:23:28,409 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:24:29,322 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:25:28,276 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:25:31,397 Stage-1 map = 50%,  reduce = 0% <-- my comment: 88% downs to 50%!
2012-05-22 14:26:32,332 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:27:02,701 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:03,314 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:21,919 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:29:22,023 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:22,081 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:32,182 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:30:34,227 Stage-1 map = 50%,  reduce = 0% <-- my comment: again!
2012-05-22 14:31:34,948 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:32:01,198 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:01,904 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:20,150 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:34:21,127 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:22,018 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:33,295 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:35:43,137 Stage-1 map = 50%,  reduce = 0% <-- my comment: and again!
2012-05-22 14:36:44,057 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:37:17,486 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:18,116 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:36,327 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:39:36,936 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:37,660 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:41,731 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:40:43,759 Stage-1 map = 50%,  reduce = 0% <-- my comment: last one!
2012-05-22 14:40:47,815 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201205191141_0110 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask



I can not understand why the process was completed with an error and why the progress of the MAP procedure is so strange.

I have found 2 'workarounds':
1) divide original query for two by adding 'WHERE to_date(sell_date) < to_date(border_date)' and 'WHERE to_date(sell_date) >= to_date(border_date)'. As a result each query contains 11 500 000 rows and the copying process is completed without errors.
2) In other hand, change 'stored by rcfile' to 'stored by sequencefile' without WHERE predicate. Again, the query will complete without errors.
I have no idea about this behavior. Maybe I have not enough knowledge to understand the differencies between RCFile and SequenceFile, so can someone explain to me my mistakes?

PS. Sorry for bad English. Natural languages are my Achilles' heel.

Vyacheslav