You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/11/07 06:26:58 UTC
[jira] [Created] (HIVE-15138) String + Integer gets converted to
UDFToDouble causing number format exceptions
Rajesh Balamohan created HIVE-15138:
---------------------------------------
Summary: String + Integer gets converted to UDFToDouble causing number format exceptions
Key: HIVE-15138
URL: https://issues.apache.org/jira/browse/HIVE-15138
Project: Hive
Issue Type: Improvement
Reporter: Rajesh Balamohan
Priority: Minor
TPCDS Query 72 has {{"d3.d_date > d1.d_date + 5"}} where in, d_date contains data like {{2002-02-03, 2001-11-07}}. When running this query, compiler converts this into UDFToDouble and causes large number of
{{NumberFormatExceptions}} trying to convert string to double. Example Stack trace is given below, which can be a good amount of perf hit filling up the stack for every row, depending on the amount of data.
{noformat}
"TezTaskRunner" #41340 daemon prio=5 os_prio=0 tid=0x00007f7914745000 nid=0x9725 runnable [0x00007f787ee4a000]
java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
- locked <0x00007f804b125ab0> (a java.lang.NumberFormatException)
at java.lang.Throwable.<init>(Throwable.java:265)
at java.lang.Exception.<init>(Exception.java:66)
at java.lang.RuntimeException.<init>(RuntimeException.java:62)
at java.lang.IllegalArgumentException.<init>(IllegalArgumentException.java:52)
at java.lang.NumberFormatException.<init>(NumberFormatException.java:55)
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at org.apache.hadoop.hive.ql.udf.UDFToDouble.evaluate(UDFToDouble.java:172)
at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:967)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:194)
at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:194)
at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:121)
at org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleColGreaterDoubleColumn.evaluate(FilterDoubleColGreaterDoubleColumn.java:51)
at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:110)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:144)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
{noformat}
Simple query to reproduce this issue is given below. It would be helpful if hive gives explicit WARN messages so that end user can add explicit casts to avoid such situations.
{noformat}
Latest Hive (master): (Check UDFToDouble for d_date field)
====================
hive> explain select distinct d_date + 5 from date_dim limit 10;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: rbalamohan_20161107005816_1cc412bf-c19c-45c4-b468-236e4fc8ae09:8
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: date_dim
Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: (UDFToDouble(d_date) + 5.0) (type: double)
outputColumnNames: _col0
Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: double)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: double)
sort order: +
Map-reduce partition columns: _col0 (type: double)
Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: NONE
TopN Hash Memory Usage: 0.04
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: double)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 41016452 Basic stats: COMPLETE Column stats: NONE
Limit
Number of rows: 10
Statistics: Num rows: 10 Data size: 11230 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 10 Data size: 11230 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: 10
Processor Tree:
ListSink
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)