You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Aditya Kishore (JIRA)" <ji...@apache.org> on 2013/02/01 01:45:14 UTC

[jira] [Updated] (HIVE-3935) New line character in output when sequence file is used for storage and table is empty

     [ https://issues.apache.org/jira/browse/HIVE-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aditya Kishore updated HIVE-3935:
---------------------------------

    Attachment: HIVE-3935-0.9.patch

Attaching a proposed patch where, if the partition descriptor is empty, use the table's InputFormat Class as the InputFormat Class for the split.
                
> New line character in output when sequence file is used for storage and table is empty
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-3935
>                 URL: https://issues.apache.org/jira/browse/HIVE-3935
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.9.0, 0.10.0
>         Environment: Centos 6.3
>            Reporter: Doodle gum
>         Attachments: HIVE-3935-0.9.patch
>
>
> When a "select distinct" command is issued on empty table which uses sequence file for storage, a new extra line (0x0a) is present in the result set even when table has no data. This output is not consistent with result of same command Hive 0.7.1 and can cause workflows to fail due to wrong record count.
> Execution on Hive 0.9 and 0.10
> hive> create table hoge2(col1 string,col2 string) partitioned by (p_part
> string) stored as sequencefile;
> hive> describe hoge2;
> OK
> col1    string
> col2    string
> p_part  string
> Time taken: 0.24 seconds
> hive> select distinct p_part from hoge2;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201301230112_0001, Tracking URL =
> http://testcluster2-1:50030/jobdetails.jsp?jobid=job_201301230112_0001
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
> -Dmapred.job.tracker=maprfs:/// -kill job_201301230112_0001
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
> 2013-01-23 02:50:16,843 Stage-1 map = 0%,  reduce = 0%
> 2013-01-23 02:50:26,897 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:27,905 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:28,911 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:29,919 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:30,925 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:31,933 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:32,939 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
> sec
> 2013-01-23 02:50:33,945 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.8
> sec
> MapReduce Total cumulative CPU time: 1 seconds 800 msec
> Ended Job = job_201301230112_0001
> MapReduce Jobs Launched:
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 1.8 sec   MAPRFS Read: 327 MAPRFS
> Write: 71 SUCCESS
> Total MapReduce CPU Time Spent: 1 seconds 800 msec
> OK
> Time taken: 21.94 seconds
> Result on Hive 0.7.1
> hive> select count(distinct p_part) from hoge3;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201210261659_0019, Tracking URL =
> http://testcluster1-1:50030/jobdetails.jsp?jobid=job_201210261659_0019
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
> -Dmapred.job.tracker=maprfs:/// -kill job_201210261659_0019
> 2013-01-23 21:42:01,787 Stage-1 map = 0%,  reduce = 0%
> 2013-01-23 21:42:07,815 Stage-1 map = 100%,  reduce = 0%
> 2013-01-23 21:42:12,835 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201210261659_0019
> OK
> 0
> Time taken: 16.637 seconds
> Underlying Hadoop version for Hive 0.9 is Hadoop 1.0.3 and for Hive 0.7 it is 0.20.203

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira