You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Teddy Choi (JIRA)" <ji...@apache.org> on 2013/11/14 02:10:22 UTC

[jira] [Updated] (HIVE-5393) Hive Multi insert statement is not logging how many rows loaded for individual insert statements

     [ https://issues.apache.org/jira/browse/HIVE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Teddy Choi updated HIVE-5393:
-----------------------------

    Description: 
   When I am running multi insert statment in Hive to copy data from a single source table to a target table but into two different partitions, The log is not showing how many rows got loaded into two different partitions by the separate insert statements. It is only showing how many rows loaded for only insert statement not both of them.


Below is the query:

=============================================================
from WBUSINESS_STR
insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION (data_year='2013',data_month='09',data_day='23',data_hourmin='0000',data_run_id='201309230001') 
select
year(DB_COMMIT_TIMESTAMP ) as year,
month(DB_COMMIT_TIMESTAMP) as month,
day(DB_COMMIT_TIMESTAMP) as day,
concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
'201309230001' as RUN_ID,
cast(DB_COMMIT_TIMESTAMP as timestamp),
DB_ACTION_TYPE,
cast( RBA as bigint),
cast( ID as bigint ),
cast( ACCOUNT_NUMBER as bigint ),
cast( TIME_CREATED as bigint ),
cast( FLAGS as bigint ),
NAME ,
NAME_UPPER ,
cast( ADDRESS_ID as bigint ),
cast( EIN as bigint ),
DUNS_NUMBER ,
cast( SIC as bigint ),
cast( TYPE as bigint ),
STATE_INC ,
cast( AVERAGE_TICKET as bigint ),
SERVICE_EMAIL ,
SERVICE_PHONE ,
cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as timestamp),
DISPUTE_EMAIL ,
DOING_BUSINESS_AS 
where 
cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-23 00:00:00'  as timestamp)
and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:30:00' as timestamp)
insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION (data_year='2013',data_month='09',data_day='22',data_hourmin='2330',data_run_id='201309230001') 
select
year(DB_COMMIT_TIMESTAMP ) as year,
month(DB_COMMIT_TIMESTAMP) as month,
day(DB_COMMIT_TIMESTAMP) as day,
concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
'201309230001' as RUN_ID,
cast(DB_COMMIT_TIMESTAMP as timestamp),
DB_ACTION_TYPE,
cast( RBA as bigint),
cast( ID as bigint ),
cast( ACCOUNT_NUMBER as bigint ),
cast( TIME_CREATED as bigint ),
cast( FLAGS as bigint ),
NAME ,
NAME_UPPER ,
cast( ADDRESS_ID as bigint ),
cast( EIN as bigint ),
DUNS_NUMBER ,
cast( SIC as bigint ),
cast( TYPE as bigint ),
STATE_INC ,
cast( AVERAGE_TICKET as bigint ),
SERVICE_EMAIL ,
SERVICE_PHONE ,
cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as timestamp),
DISPUTE_EMAIL ,
DOING_BUSINESS_AS 
where 
cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-22 23:30:00'  as timestamp)
and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:00:00' as timestamp)
============================================

The Hive log says:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201309261542_0042, Tracking URL = http://dmdevetllvs01:50030/jobdetails.jsp?jobid=job_201309261542_0042
Kill Command = /dmdev/data01/paypal/HADOOP/node1/hadoop-0.20.2-cdh3u4/bin/hadoop job  -Dmapred.job.tracker=dmdevetllvs01:9010 -kill job_201309261542_0042
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2013-09-27 16:24:26,542 Stage-2 map = 0%,  reduce = 0%
2013-09-27 16:24:28,565 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.52 sec
2013-09-27 16:24:29,574 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.52 sec
MapReduce Total cumulative CPU time: 2 seconds 520 msec
Ended Job = job_201309261542_0042
Ended Job = 730112486, job is filtered out (removed at runtime).
Ended Job = -507348248, job is filtered out (removed at runtime).
Moving data to: hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10000
Moving data to: hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10002
Loading data to table default.wbusiness_intraday_30mins partition (data_year=2013, data_month=09, data_day=22, data_hourmin=2330, data_run_id=201309230001)
Deleted hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=22/data_hourmin=2330/data_run_id=201309230001
Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, data_day=22, data_hourmin=2330, data_run_id=201309230001} stats: [num_files: 1, num_rows: 0, total_size: 12283, raw_data_size: 0]
Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
Loading data to table default.wbusiness_intraday_30mins partition (data_year=2013, data_month=09, data_day=23, data_hourmin=0000, data_run_id=201309230001)
Deleted hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=23/data_hourmin=0000/data_run_id=201309230001
Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, data_day=23, data_hourmin=0000, data_run_id=201309230001} stats: [num_files: 1, num_rows: 0, total_size: 1870, raw_data_size: 0]
Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
141 Rows loaded to wbusiness_intraday_30mins
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 2.52 sec   HDFS Read: 4191 HDFS Write: 14153 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 520 msec
OK
Time taken: 10.771 seconds
====================================================================

It is only showing 141 rows loaded for a partition.
It is not showing another 21 rows loaded in another partition.

The source table has 161 rows.

Please let us know if this is normal.



  was:
When I am running multi insert statment in Hive to copy data from a single source table to a target table but into two different partitions, The log is not showing how many rows got loaded into two different partitions by the separate insert statements. It is only showing how many rows loaded for only insert statement not both of them.


Below is the query:

=============================================================
from WBUSINESS_STR
insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION (data_year='2013',data_month='09',data_day='23',data_hourmin='0000',data_run_id='201309230001') 
select
year(DB_COMMIT_TIMESTAMP ) as year,
month(DB_COMMIT_TIMESTAMP) as month,
day(DB_COMMIT_TIMESTAMP) as day,
concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
'201309230001' as RUN_ID,
cast(DB_COMMIT_TIMESTAMP as timestamp),
DB_ACTION_TYPE,
cast( RBA as bigint),
cast( ID as bigint ),
cast( ACCOUNT_NUMBER as bigint ),
cast( TIME_CREATED as bigint ),
cast( FLAGS as bigint ),
NAME ,
NAME_UPPER ,
cast( ADDRESS_ID as bigint ),
cast( EIN as bigint ),
DUNS_NUMBER ,
cast( SIC as bigint ),
cast( TYPE as bigint ),
STATE_INC ,
cast( AVERAGE_TICKET as bigint ),
SERVICE_EMAIL ,
SERVICE_PHONE ,
cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as timestamp),
DISPUTE_EMAIL ,
DOING_BUSINESS_AS 
where 
cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-23 00:00:00'  as timestamp)
and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:30:00' as timestamp)
insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION (data_year='2013',data_month='09',data_day='22',data_hourmin='2330',data_run_id='201309230001') 
select
year(DB_COMMIT_TIMESTAMP ) as year,
month(DB_COMMIT_TIMESTAMP) as month,
day(DB_COMMIT_TIMESTAMP) as day,
concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
'201309230001' as RUN_ID,
cast(DB_COMMIT_TIMESTAMP as timestamp),
DB_ACTION_TYPE,
cast( RBA as bigint),
cast( ID as bigint ),
cast( ACCOUNT_NUMBER as bigint ),
cast( TIME_CREATED as bigint ),
cast( FLAGS as bigint ),
NAME ,
NAME_UPPER ,
cast( ADDRESS_ID as bigint ),
cast( EIN as bigint ),
DUNS_NUMBER ,
cast( SIC as bigint ),
cast( TYPE as bigint ),
STATE_INC ,
cast( AVERAGE_TICKET as bigint ),
SERVICE_EMAIL ,
SERVICE_PHONE ,
cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as timestamp),
DISPUTE_EMAIL ,
DOING_BUSINESS_AS 
where 
cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-22 23:30:00'  as timestamp)
and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:00:00' as timestamp)
============================================

The Hive log says:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201309261542_0042, Tracking URL = http://dmdevetllvs01:50030/jobdetails.jsp?jobid=job_201309261542_0042
Kill Command = /dmdev/data01/paypal/HADOOP/node1/hadoop-0.20.2-cdh3u4/bin/hadoop job  -Dmapred.job.tracker=dmdevetllvs01:9010 -kill job_201309261542_0042
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2013-09-27 16:24:26,542 Stage-2 map = 0%,  reduce = 0%
2013-09-27 16:24:28,565 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.52 sec
2013-09-27 16:24:29,574 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.52 sec
MapReduce Total cumulative CPU time: 2 seconds 520 msec
Ended Job = job_201309261542_0042
Ended Job = 730112486, job is filtered out (removed at runtime).
Ended Job = -507348248, job is filtered out (removed at runtime).
Moving data to: hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10000
Moving data to: hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10002
Loading data to table default.wbusiness_intraday_30mins partition (data_year=2013, data_month=09, data_day=22, data_hourmin=2330, data_run_id=201309230001)
Deleted hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=22/data_hourmin=2330/data_run_id=201309230001
Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, data_day=22, data_hourmin=2330, data_run_id=201309230001} stats: [num_files: 1, num_rows: 0, total_size: 12283, raw_data_size: 0]
Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
Loading data to table default.wbusiness_intraday_30mins partition (data_year=2013, data_month=09, data_day=23, data_hourmin=0000, data_run_id=201309230001)
Deleted hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=23/data_hourmin=0000/data_run_id=201309230001
Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, data_day=23, data_hourmin=0000, data_run_id=201309230001} stats: [num_files: 1, num_rows: 0, total_size: 1870, raw_data_size: 0]
Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
141 Rows loaded to wbusiness_intraday_30mins
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 2.52 sec   HDFS Read: 4191 HDFS Write: 14153 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 520 msec
OK
Time taken: 10.771 seconds
====================================================================

It is only showing 141 rows loaded for a partition.
It is not showing another 21 rows loaded in another partition.

The source table has 161 rows.

Please let us know if this is normal.




> Hive Multi insert statement  is not logging how many rows loaded for individual insert statements
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-5393
>                 URL: https://issues.apache.org/jira/browse/HIVE-5393
>             Project: Hive
>          Issue Type: Bug
>          Components: Diagnosability
>    Affects Versions: 0.9.0
>         Environment: Hive 0.9.0
> Cloudera CDH3U4
>            Reporter: Sandip Das
>            Priority: Minor
>
>    When I am running multi insert statment in Hive to copy data from a single source table to a target table but into two different partitions, The log is not showing how many rows got loaded into two different partitions by the separate insert statements. It is only showing how many rows loaded for only insert statement not both of them.
> Below is the query:
> =============================================================
> from WBUSINESS_STR
> insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION (data_year='2013',data_month='09',data_day='23',data_hourmin='0000',data_run_id='201309230001') 
> select
> year(DB_COMMIT_TIMESTAMP ) as year,
> month(DB_COMMIT_TIMESTAMP) as month,
> day(DB_COMMIT_TIMESTAMP) as day,
> concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
> '201309230001' as RUN_ID,
> cast(DB_COMMIT_TIMESTAMP as timestamp),
> DB_ACTION_TYPE,
> cast( RBA as bigint),
> cast( ID as bigint ),
> cast( ACCOUNT_NUMBER as bigint ),
> cast( TIME_CREATED as bigint ),
> cast( FLAGS as bigint ),
> NAME ,
> NAME_UPPER ,
> cast( ADDRESS_ID as bigint ),
> cast( EIN as bigint ),
> DUNS_NUMBER ,
> cast( SIC as bigint ),
> cast( TYPE as bigint ),
> STATE_INC ,
> cast( AVERAGE_TICKET as bigint ),
> SERVICE_EMAIL ,
> SERVICE_PHONE ,
> cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as timestamp),
> DISPUTE_EMAIL ,
> DOING_BUSINESS_AS 
> where 
> cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-23 00:00:00'  as timestamp)
> and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:30:00' as timestamp)
> insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION (data_year='2013',data_month='09',data_day='22',data_hourmin='2330',data_run_id='201309230001') 
> select
> year(DB_COMMIT_TIMESTAMP ) as year,
> month(DB_COMMIT_TIMESTAMP) as month,
> day(DB_COMMIT_TIMESTAMP) as day,
> concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
> '201309230001' as RUN_ID,
> cast(DB_COMMIT_TIMESTAMP as timestamp),
> DB_ACTION_TYPE,
> cast( RBA as bigint),
> cast( ID as bigint ),
> cast( ACCOUNT_NUMBER as bigint ),
> cast( TIME_CREATED as bigint ),
> cast( FLAGS as bigint ),
> NAME ,
> NAME_UPPER ,
> cast( ADDRESS_ID as bigint ),
> cast( EIN as bigint ),
> DUNS_NUMBER ,
> cast( SIC as bigint ),
> cast( TYPE as bigint ),
> STATE_INC ,
> cast( AVERAGE_TICKET as bigint ),
> SERVICE_EMAIL ,
> SERVICE_PHONE ,
> cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as timestamp),
> DISPUTE_EMAIL ,
> DOING_BUSINESS_AS 
> where 
> cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-22 23:30:00'  as timestamp)
> and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:00:00' as timestamp)
> ============================================
> The Hive log says:
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201309261542_0042, Tracking URL = http://dmdevetllvs01:50030/jobdetails.jsp?jobid=job_201309261542_0042
> Kill Command = /dmdev/data01/paypal/HADOOP/node1/hadoop-0.20.2-cdh3u4/bin/hadoop job  -Dmapred.job.tracker=dmdevetllvs01:9010 -kill job_201309261542_0042
> Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
> 2013-09-27 16:24:26,542 Stage-2 map = 0%,  reduce = 0%
> 2013-09-27 16:24:28,565 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.52 sec
> 2013-09-27 16:24:29,574 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.52 sec
> MapReduce Total cumulative CPU time: 2 seconds 520 msec
> Ended Job = job_201309261542_0042
> Ended Job = 730112486, job is filtered out (removed at runtime).
> Ended Job = -507348248, job is filtered out (removed at runtime).
> Moving data to: hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10000
> Moving data to: hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10002
> Loading data to table default.wbusiness_intraday_30mins partition (data_year=2013, data_month=09, data_day=22, data_hourmin=2330, data_run_id=201309230001)
> Deleted hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=22/data_hourmin=2330/data_run_id=201309230001
> Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, data_day=22, data_hourmin=2330, data_run_id=201309230001} stats: [num_files: 1, num_rows: 0, total_size: 12283, raw_data_size: 0]
> Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
> Loading data to table default.wbusiness_intraday_30mins partition (data_year=2013, data_month=09, data_day=23, data_hourmin=0000, data_run_id=201309230001)
> Deleted hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=23/data_hourmin=0000/data_run_id=201309230001
> Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, data_day=23, data_hourmin=0000, data_run_id=201309230001} stats: [num_files: 1, num_rows: 0, total_size: 1870, raw_data_size: 0]
> Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
> 141 Rows loaded to wbusiness_intraday_30mins
> MapReduce Jobs Launched: 
> Job 0: Map: 1   Cumulative CPU: 2.52 sec   HDFS Read: 4191 HDFS Write: 14153 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 520 msec
> OK
> Time taken: 10.771 seconds
> ====================================================================
> It is only showing 141 rows loaded for a partition.
> It is not showing another 21 rows loaded in another partition.
> The source table has 161 rows.
> Please let us know if this is normal.



--
This message was sent by Atlassian JIRA
(v6.1#6144)