You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2013/12/07 10:54:36 UTC
[jira] [Commented] (HIVE-5979) Failure in cast to timestamps.
[ https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842163#comment-13842163 ]
Gopal V commented on HIVE-5979:
-------------------------------
(Pasted from an email)
The nano second sql timestamp stuff in Java is horribly broken for usability.
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java#L52
Read my comments there on how it handles -ve timestamps and sub-second timings.
Because of the way integer division works in Java, you can end with rounding towards zero - this causes hell with the restriction that setNanos() has to always be positive.
On top of that it uses 1 integer and 1 long to store the time always (unix-epoch seconds + nanos) - the millisecond fraction is stored in the nanos field, so the setNanos() overwrites the millisecond fraction of time always, which is why the getNanos() is added to it.
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/sql/Timestamp.java#Timestamp.setTime%28long%29
That makes sense, until you realize that a negative millisecond timing is stored as a -1ve second + positive nanosecond time.
So when you mix that with the negative modulo in Java, you end up with a fairly ugly kludge which needs to take care of a several edge cases related to the java.sql.Timestamp implementation.
> Failure in cast to timestamps.
> ------------------------------
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
> Issue Type: Sub-task
> Reporter: Jitendra Nath Pandey
> Assignee: Jitendra Nath Pandey
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
> cast(i as timestamp), cast(b as timestamp),
> cast(f as string), cast(d as timestamp),
> cast(bo as timestamp), cast(b * 0 as timestamp),
> cast(ts as timestamp), cast(s as timestamp),
> cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_000000, diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_000000_0 Info:Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 999999999 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> si smallint from deserializer
> i int from deserializer
> b bigint from deserializer
> f float from deserializer
> d double from deserializer
> bo boolean from deserializer
> s string from deserializer
> s2 string from deserializer
> ts timestamp from deserializer
> Time taken: 0.521 seconds, Fetched: 10 row(s)
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)