You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Pengcheng Xiong (JIRA)" <ji...@apache.org> on 2016/05/27 06:12:12 UTC
[jira] [Comment Edited] (HIVE-13837) current_timestamp() output
format is different in some cases
[ https://issues.apache.org/jira/browse/HIVE-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303595#comment-15303595 ]
Pengcheng Xiong edited comment on HIVE-13837 at 5/27/16 6:11 AM:
-----------------------------------------------------------------
minor change from ":" to "." according to Oracle timestamp standard. Resubmit the patch.
was (Author: pxiong):
minor change from ":" to "." according to Oracle timestamp standard.
> current_timestamp() output format is different in some cases
> ------------------------------------------------------------
>
> Key: HIVE-13837
> URL: https://issues.apache.org/jira/browse/HIVE-13837
> Project: Hive
> Issue Type: Bug
> Reporter: Pengcheng Xiong
> Assignee: Pengcheng Xiong
> Attachments: HIVE-13837.01.patch, HIVE-13837.02.patch
>
>
> As [~jdere] reports:
> {code}
> current_timestamp() udf returns result with different format in some cases.
> select current_timestamp() returns result with decimal precision:
> {noformat}
> hive> select current_timestamp();
> OK
> 2016-04-14 18:26:58.875
> Time taken: 0.077 seconds, Fetched: 1 row(s)
> {noformat}
> But output format is different for select current_timestamp() from all100k union select current_timestamp() from over100k limit 5;
> {noformat}
> hive> select current_timestamp() from all100k union select current_timestamp() from over100k limit 5;
> Query ID = hrt_qa_20160414182956_c4ed48f2-9913-4b3b-8f09-668ebf55b3e3
> Total jobs = 1
> Launching Job 1 out of 1
> Tez session was closed. Reopening...
> Session re-established.
> Status: Running (Executing on YARN cluster with App id application_1460611908643_0624)
> ----------------------------------------------------------------------------------------------
> VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> ----------------------------------------------------------------------------------------------
> Map 1 .......... llap SUCCEEDED 1 1 0 0 0 0
> Map 4 .......... llap SUCCEEDED 1 1 0 0 0 0
> Reducer 3 ...... llap SUCCEEDED 1 1 0 0 0 0
> ----------------------------------------------------------------------------------------------
> VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 0.92 s
> ----------------------------------------------------------------------------------------------
> OK
> 2016-04-14 18:29:56
> Time taken: 10.558 seconds, Fetched: 1 row(s)
> {noformat}
> explain plan for select current_timestamp();
> {noformat}
> hive> explain extended select current_timestamp();
> OK
> ABSTRACT SYNTAX TREE:
>
> TOK_QUERY
> TOK_INSERT
> TOK_DESTINATION
> TOK_DIR
> TOK_TMP_FILE
> TOK_SELECT
> TOK_SELEXPR
> TOK_FUNCTION
> current_timestamp
> STAGE DEPENDENCIES:
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> TableScan
> alias: _dummy_table
> Row Limit Per Split: 1
> GatherStats: false
> Select Operator
> expressions: 2016-04-14 18:30:57.206 (type: timestamp)
> outputColumnNames: _col0
> ListSink
> Time taken: 0.062 seconds, Fetched: 30 row(s)
> {noformat}
> explain plan for select current_timestamp() from all100k union select current_timestamp() from over100k limit 5;
> {noformat}
> hive> explain extended select current_timestamp() from all100k union select current_timestamp() from over100k limit 5;
> OK
> ABSTRACT SYNTAX TREE:
>
> TOK_QUERY
> TOK_FROM
> TOK_SUBQUERY
> TOK_QUERY
> TOK_FROM
> TOK_SUBQUERY
> TOK_UNIONALL
> TOK_QUERY
> TOK_FROM
> TOK_TABREF
> TOK_TABNAME
> all100k
> TOK_INSERT
> TOK_DESTINATION
> TOK_DIR
> TOK_TMP_FILE
> TOK_SELECT
> TOK_SELEXPR
> TOK_FUNCTION
> current_timestamp
> TOK_QUERY
> TOK_FROM
> TOK_TABREF
> TOK_TABNAME
> over100k
> TOK_INSERT
> TOK_DESTINATION
> TOK_DIR
> TOK_TMP_FILE
> TOK_SELECT
> TOK_SELEXPR
> TOK_FUNCTION
> current_timestamp
> _u1
> TOK_INSERT
> TOK_DESTINATION
> TOK_DIR
> TOK_TMP_FILE
> TOK_SELECTDI
> TOK_SELEXPR
> TOK_ALLCOLREF
> _u2
> TOK_INSERT
> TOK_DESTINATION
> TOK_DIR
> TOK_TMP_FILE
> TOK_SELECT
> TOK_SELEXPR
> TOK_ALLCOLREF
> TOK_LIMIT
> 5
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: hrt_qa_20160414183119_ec8e109e-8975-4799-a142-4a2289f85910:7
> Edges:
> Map 1 <- Union 2 (CONTAINS)
> Map 4 <- Union 2 (CONTAINS)
> Reducer 3 <- Union 2 (SIMPLE_EDGE)
> DagName:
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: all100k
> Statistics: Num rows: 100000 Data size: 15801336 Basic stats: COMPLETE Column stats: COMPLETE
> GatherStats: false
> Select Operator
> Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: 2016-04-14 18:31:19.0 (type: timestamp)
> outputColumnNames: _col0
> Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: timestamp)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: timestamp)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: timestamp)
> Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
> tag: -1
> TopN: 5
> TopN Hash Memory Usage: 0.04
> auto parallelism: true
> Execution mode: llap
> LLAP IO: no inputs
> Path -> Alias:
> hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k [all100k]
> Path -> Partition:
> hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
> Partition
> base file name: all100k
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
> COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
> EXTERNAL TRUE
> bucket_count -1
> columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
> columns.comments
> columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
> field.delim |
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
> name default.all100k
> numFiles 1
> numRows 100000
> rawDataSize 15801336
> serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b, float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp ts, date dt}
> serialization.format |
> serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 15901336
> transient_lastDdlTime 1460612683
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
> COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
> EXTERNAL TRUE
> bucket_count -1
> columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
> columns.comments
> columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
> field.delim |
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
> name default.all100k
> numFiles 1
> numRows 100000
> rawDataSize 15801336
> serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b, float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp ts, date dt}
> serialization.format |
> serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 15901336
> transient_lastDdlTime 1460612683
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.all100k
> name: default.all100k
> Truncated Path -> Alias:
> hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k [all100k]
> Map 4
> Map Operator Tree:
> TableScan
> alias: over100k
> Statistics: Num rows: 100000 Data size: 6631229 Basic stats: COMPLETE Column stats: COMPLETE
> GatherStats: false
> Select Operator
> Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: 2016-04-14 18:31:19.0 (type: timestamp)
> outputColumnNames: _col0
> Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: timestamp)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: timestamp)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: timestamp)
> Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
> tag: -1
> TopN: 5
> TopN Hash Memory Usage: 0.04
> auto parallelism: true
> Execution mode: llap
> LLAP IO: no inputs
> Path -> Alias:
> hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k [over100k]
> Path -> Partition:
> hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
> Partition
> base file name: over100k
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
> COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
> EXTERNAL TRUE
> bucket_count -1
> columns t,si,i,b,f,d,bo,s,bin
> columns.comments
> columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
> field.delim :
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
> name default.over100k
> numFiles 1
> numRows 100000
> rawDataSize 6631229
> serialization.ddl struct over100k { byte t, i16 si, i32 i, i64 b, float f, double d, bool bo, string s, binary bin}
> serialization.format :
> serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 6731229
> transient_lastDdlTime 1460612798
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
> COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
> EXTERNAL TRUE
> bucket_count -1
> columns t,si,i,b,f,d,bo,s,bin
> columns.comments
> columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
> field.delim :
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
> name default.over100k
> numFiles 1
> numRows 100000
> rawDataSize 6631229
> serialization.ddl struct over100k { byte t, i16 si, i32 i, i64 b, float f, double d, bool bo, string s, binary bin}
> serialization.format :
> serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 6731229
> transient_lastDdlTime 1460612798
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.over100k
> name: default.over100k
> Truncated Path -> Alias:
> hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k [over100k]
> Reducer 3
> Execution mode: vectorized, llap
> Needs Tagging: false
> Reduce Operator Tree:
> Group By Operator
> keys: KEY._col0 (type: timestamp)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
> Limit
> Number of rows: 5
> Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> GlobalTableId: 0
> directory: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002
> NumFilesPerFileSink: 1
> Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
> Stats Publishing Key Prefix: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002/
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> properties:
> columns _col0
> columns.types timestamp
> escape.delim \
> hive.serialization.extend.additional.nesting.levels true
> serialization.escape.crlf true
> serialization.format 1
> serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> TotalFiles: 1
> GatherStats: false
> MultiFileSpray: false
> Union 2
> Vertex: Union 2
> Stage: Stage-0
> Fetch Operator
> limit: 5
> Processor Tree:
> ListSink
> Time taken: 0.301 seconds, Fetched: 284 row(s)
> {noformat}
> Both the queries used return timestamp with YYYY-MM-DD HH:MM:SS.fff format in past releases.
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)