You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Laszlo Bodor (JIRA)" <ji...@apache.org> on 2019/03/15 09:34:00 UTC

[jira] [Commented] (HIVE-20808) Queries with map() constructor are slow with vectorization

    [ https://issues.apache.org/jira/browse/HIVE-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793489#comment-16793489 ] 

Laszlo Bodor commented on HIVE-20808:
-------------------------------------

[~barrm]: in my understanding, VectorUDFAdaptor in the trace could mean that the child expression of VectorUDFMapIndexBaseScalar is vectorized actually, that could be one of the reasons why the query is slower than "expected", especially in case it's used heavily
could you please provide a reproduction scenario (table schema, query)?

> Queries with map() constructor are slow with vectorization
> ----------------------------------------------------------
>
>                 Key: HIVE-20808
>                 URL: https://issues.apache.org/jira/browse/HIVE-20808
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matthew Barr
>            Priority: Major
>
> Queries involving map operator with vectorization enabled appear to be slowing down due to vector UDF adaptor.
> Corresponding jstack for slow task:
> {code:java}
> "TezChild" #23 daemon prio=5 os_prio=0 tid=0x00007f1e44f1b080 nid=0x9419 runnable [0x00007f1e28137000] 
> java.lang.Thread.State: RUNNABLE 
> at org.apache.hadoop.hive.ql.exec.vector.ColumnVector.ensureSize(ColumnVector.java:232) 
> at org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector.ensureSize(DecimalColumnVector.java:208) 
> at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:587) 
> at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350) 
> at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205) 
> at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:146) 
> at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271) 
> at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:57) 
> at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) 
> at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965) 
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) 
> at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:136) 
> at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965) 
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) 
> at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) 
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812) 
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845) 
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) 
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) 
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) 
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) 
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) 
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) 
> at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) 
> at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) 
> at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) 
> at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) 
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) 
> at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) 
> at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) 
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)