You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/06/07 12:39:00 UTC

[jira] [Work logged] (HIVE-25193) Vectorized Query Execution: ClassCastException when use nvl() function which default_value is decimal type

     [ https://issues.apache.org/jira/browse/HIVE-25193?focusedWorklogId=607828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-607828 ]

ASF GitHub Bot logged work on HIVE-25193:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Jun/21 12:38
            Start Date: 07/Jun/21 12:38
    Worklog Time Spent: 10m 
      Work Description: FoolishWall opened a new pull request #2358:
URL: https://github.com/apache/hive/pull/2358


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Changed in ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorCoalesce.java, evaluate() function.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   When set hive.vectorized.execution.enabled = true and use nvl() function which default_value is decimal type, the error log is as follows:
   
   `Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVectorCaused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:504) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorCoalesce.evaluate(VectorCoalesce.java:124) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271) at org.apache.hadoop.hive.ql.exec.vector.expressions.CastStringToDouble.evaluate(CastStringToDouble.java:83) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ... 28 more`
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Precommit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 607828)
    Remaining Estimate: 0h
            Time Spent: 10m

> Vectorized Query Execution: ClassCastException when use nvl() function which default_value is decimal type
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25193
>                 URL: https://issues.apache.org/jira/browse/HIVE-25193
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 4.0.0
>            Reporter: qiang.bi
>            Assignee: qiang.bi
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Problem statement:
> {code:java}
> set hive.vectorized.execution.enabled = true;
> select nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price,
> from dw_mdm_sync_asset;
> {code}
>  The error log:
> {code:java}
> Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVectorCaused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:504) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorCoalesce.evaluate(VectorCoalesce.java:124) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271) at org.apache.hadoop.hive.ql.exec.vector.expressions.CastStringToDouble.evaluate(CastStringToDouble.java:83) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ... 28 more{code}
>  The problem HiveQL:
> {code:java}
> nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price
> {code}
>  The problem expression:
> {code:java}
> CastStringToDouble(col 39:string)(children: VectorCoalesce(columns [37, 38])(children: VectorUDFAdaptor(get_json_object(_col14, '$.correctedPrice')) -> 37:string, ConstantVectorExpression(val 0.88) -> 38:decimal(2,2)) -> 39:string) -> 40:double
> {code}
>  The problem code:
> {code:java}
> public class VectorCoalesce extends VectorExpression {  
>   ...   
>   @Override
>   public void evaluate(VectorizedRowBatch batch) throws HiveException {    if (childExpressions != null) {
>       super.evaluateChildren(batch);
>     }    int[] sel = batch.selected;
>     int n = batch.size;
>     ColumnVector outputColVector = batch.cols[outputColumnNum];
>     boolean[] outputIsNull = outputColVector.isNull;
>     if (n <= 0) {
>       // Nothing to do
>       return;
>     }    if (unassignedBatchIndices == null || n > unassignedBatchIndices.length) {      // (Re)allocate larger to be a multiple of 1024 (DEFAULT_SIZE).
>       final int roundUpSize =
>           ((n + VectorizedRowBatch.DEFAULT_SIZE - 1) / VectorizedRowBatch.DEFAULT_SIZE)
>               * VectorizedRowBatch.DEFAULT_SIZE;
>       unassignedBatchIndices = new int[roundUpSize];
>     }    // We do not need to do a column reset since we are carefully changing the output.
>     outputColVector.isRepeating = false;    // CONSIDER: Should be do this for all vector expressions that can
>     //           work on BytesColumnVector output columns???
>     outputColVector.init();
>     final int columnCount = inputColumns.length;    /*
>      * Process the input columns to find a non-NULL value for each row.
>      *
>      * We track the unassigned batchIndex of the rows that have not received
>      * a non-NULL value yet.  Similar to a selected array.
>      */
>     boolean isAllUnassigned = true;
>     int unassignedColumnCount = 0;
>     for (int k = 0; k < inputColumns.length; k++) {
>       ColumnVector cv = batch.cols[inputColumns[k]];
>       if (cv.isRepeating) {        if (cv.noNulls || !cv.isNull[0]) {
>           /*
>            * With a repeating value we can finish all remaining rows.
>            */
>           if (isAllUnassigned) {            // No other columns provided non-NULL values.  We can return repeated output.
>             outputIsNull[0] = false;
>             outputColVector.setElement(0, 0, cv);
>             outputColVector.isRepeating = true;
>             return;
>           } else {            // Some rows have already been assigned values. Assign the remaining.
>             // We cannot use copySelected method here.
>             for (int i = 0; i < unassignedColumnCount; i++) {
>               final int batchIndex = unassignedBatchIndices[i];
>               outputIsNull[batchIndex] = false;              // Our input is repeating (i.e. inputColNumber = 0).
>               outputColVector.setElement(batchIndex, 0, cv);
>             }
>             return;
>           }
>         } else {          // Repeated NULLs -- skip this input column.
>         }
>       } else {        /*
>          * Non-repeating input column. Use any non-NULL values for unassigned rows.
>          */
>         if (isAllUnassigned) {          /*
>            * No other columns provided non-NULL values.  We *may* be able to finish all rows
>            * with this input column...
>            */
>           if (cv.noNulls){            // Since no NULLs, we can provide values for all rows.
>             if (batch.selectedInUse) {
>               for (int i = 0; i < n; i++) {
>                 final int batchIndex = sel[i];
>                 outputIsNull[batchIndex] = false;
>                 outputColVector.setElement(batchIndex, batchIndex, cv);
>               }
>             } else {
>               Arrays.fill(outputIsNull, 0, n, false);
>               for (int batchIndex = 0; batchIndex < n; batchIndex++) {
>                 outputColVector.setElement(batchIndex, batchIndex, cv);
>               }
>             }
>             return;
>           } else {            // We might not be able to assign all rows because of input NULLs.  Start tracking any
>             // unassigned rows.
>             boolean[] inputIsNull = cv.isNull;
>             if (batch.selectedInUse) {
>               for (int i = 0; i < n; i++) {
>                 final int batchIndex = sel[i];
>                 if (!inputIsNull[batchIndex]) {
>                   outputIsNull[batchIndex] = false;
>                   outputColVector.setElement(batchIndex, batchIndex, cv);
>                 } else {
>                   unassignedBatchIndices[unassignedColumnCount++] = batchIndex;
>                 }
>               }
>             } else {
>               for (int batchIndex = 0; batchIndex < n; batchIndex++) {
>                 if (!inputIsNull[batchIndex]) {
>                   outputIsNull[batchIndex] = false;
>                   outputColVector.setElement(batchIndex, batchIndex, cv);
>                 } else {
>                   unassignedBatchIndices[unassignedColumnCount++] = batchIndex;
>                 }
>               }
>             }
>             if (unassignedColumnCount == 0) {
>               return;
>             }
>             isAllUnassigned = false;
>           }
>         } else {          /*
>            * We previously assigned *some* rows with non-NULL values. The batch indices of
>            * the unassigned row were tracked.
>            */
>           if (cv.noNulls) {            // Assign all remaining rows.
>             for (int i = 0; i < unassignedColumnCount; i++) {
>               final int batchIndex = unassignedBatchIndices[i];
>               outputIsNull[batchIndex] = false;
>               outputColVector.setElement(batchIndex, batchIndex, cv);
>             }
>             return;
>           } else {            // Use any non-NULL values found; remember the remaining unassigned.
>             boolean[] inputIsNull = cv.isNull;
>             int newUnassignedColumnCount = 0;
>             for (int i = 0; i < unassignedColumnCount; i++) {
>               final int batchIndex = unassignedBatchIndices[i];
>               if (!inputIsNull[batchIndex]) {
>                 outputIsNull[batchIndex] = false;
>                 outputColVector.setElement(batchIndex, batchIndex, cv);
>               } else {
>                 unassignedBatchIndices[newUnassignedColumnCount++] = batchIndex;
>               }
>             }
>             if (newUnassignedColumnCount == 0) {
>               return;
>             }
>             unassignedColumnCount = newUnassignedColumnCount;
>           }
>         }
>       }
>     }    // NULL out the remaining columns.
>     outputColVector.noNulls = false;
>     if (isAllUnassigned) {
>       outputIsNull[0] = true;
>       outputColVector.isRepeating = true;
>     } else {
>       for (int i = 0; i < unassignedColumnCount; i++) {
>         final int batchIndex = unassignedBatchIndices[i];
>         outputIsNull[batchIndex] = true;
>       }
>     }
>   }
>   ...
> }
> {code}
> The above code,  outputColVector is BytesColumnVector type, but one of the columnVector is DecimalColumnVector type.
>  At present, we can add single quotes in “0.88” to resolve this problem.For example: 
> {code:java}
> nvl(get_json_object(attr_json,'$.correctedPrice'), '0.88') corrected_price
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)