You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ajantha-bhat <gi...@git.apache.org> on 2018/07/25 14:16:40 UTC

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2559

    [CARBONDATA-2606][Complex DataType Enhancements]Fix Null result if projection column have null primitive column and struct 

    Problem:
    In case if the actual value of the primitive data type is null, by PR#2489, we are moving all the null values to the end of the collected row without considering the data type.
    
    Solution:
    Place null in the end of output iff the null value is of complex primitive column.
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed? NA
     
     - [ ] Any backward compatibility impacted?NA
     
     - [ ] Document update required?NA
    
     - [ ] Testing done
         updated UT 
    
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata master_doc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2559
    
----
commit 7af27b66fb1491f5ac7f9bf155723289d39ad7b0
Author: ajantha-bhat <aj...@...>
Date:   2018-07-25T13:51:02Z

    [CARBONDATA-2606][Complex DataType Enhancements] Fix Null result if projection column have null primitive column and struct

----


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    retest sdv please


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6074/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6366/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7650/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7509/



---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r205975710
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    --- End diff --
    
    This check is not good in terms of performance,  every time generating a string from map and check contains is heavy operation. please simplify it. And also I don't think we require to check it for every row.


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7604/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    retest this please


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    @kumarvishal09 : PR is ready please review


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6384/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6008/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7597/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by brijoobopanna <gi...@git.apache.org>.
Github user brijoobopanna commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    retest this please



---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r206019255
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    +        for (ProjectionDimension dimension : queryDimensions) {
    +          if (null != dimension.getDimension().getComplexParentDimension()) {
    +            isComplexChildColumn[dimension.getOrdinal()] = true;
    +          }
             }
    -        while (count < row.length) row[count++] = null;
    +        shiftNullForStruct(row, isComplexChildColumn);
           }
           listBasedResult.add(row);
           rowCounter++;
         }
         return listBasedResult;
       }
     
    +  /**
    +   * shift the complex column null to the end
    +   *
    +   * @param row
    +   * @param isComplexChildColumn
    +   */
    +  private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) {
    +    int count = 0;
    +    int dataTypeCount = 0;
    +    // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    +    // projection list,then object array will contain a,null,d as result, because for a.b,
    +    // a will be filled and for a.c null will be placed.
    +    // Instead place null in the end of object array and send a,d,null as result.
    +    for (int j = 0; j < row.length; j++) {
    +      if (null == row[j] && !isComplexChildColumn[j]) {
    +        // if it is a primitive column, don't shift the null to the end.
    +        row[count++] = null;
    +        isComplexChildColumn[dataTypeCount++] = false;
    +      } else if (null != row[j]) {
    +        row[count++] = row[j];
    +        isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j];
    +      }
    +    }
    +    // fill the skipped content
    +    while (count < row.length) row[count++] = null;
    +    while (dataTypeCount < isComplexChildColumn.length) {
    +      isComplexChildColumn[dataTypeCount++] = true;
    --- End diff --
    
    No need. removed


---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2559


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by brijoobopanna <gi...@git.apache.org>.
Github user brijoobopanna commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    retest sdv please



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    @ravipesala : please review this


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6059/



---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r206019282
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    --- End diff --
    
    one time is enough


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6263/



---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r206019351
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    --- End diff --
    
    yes, avoided string conversion.


---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r205975748
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    --- End diff --
    
    Does it really need to create this array for each row? I feel it is one time array fill is enough


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    LGTM


---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by brijoobopanna <gi...@git.apache.org>.
Github user brijoobopanna commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    retest this please



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6394/



---

[GitHub] carbondata issue #2559: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2559
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7642/



---

[GitHub] carbondata pull request #2559: [CARBONDATA-2606][Complex DataType Enhancemen...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2559#discussion_r205975814
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -141,22 +141,50 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
           }
           fillMeasureData(scannedResult, row);
           if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) {
    -        // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    -        // projection list,then object array will contain a,null,d as result, because for a.b,
    -        // a will be filled and for a.c null will be placed.
    -        // Instead place null in the end of object array and send a,d,null as result.
    -        int count = 0;
    -        for (int j = 0; j < row.length; j++) {
    -          if (row[j] != null) row[count++] = row[j];
    +        boolean[] isComplexChildColumn = new boolean[queryDimensions.length + queryMeasures.length];
    +        for (ProjectionDimension dimension : queryDimensions) {
    +          if (null != dimension.getDimension().getComplexParentDimension()) {
    +            isComplexChildColumn[dimension.getOrdinal()] = true;
    +          }
             }
    -        while (count < row.length) row[count++] = null;
    +        shiftNullForStruct(row, isComplexChildColumn);
           }
           listBasedResult.add(row);
           rowCounter++;
         }
         return listBasedResult;
       }
     
    +  /**
    +   * shift the complex column null to the end
    +   *
    +   * @param row
    +   * @param isComplexChildColumn
    +   */
    +  private void shiftNullForStruct(Object[] row, boolean[] isComplexChildColumn) {
    +    int count = 0;
    +    int dataTypeCount = 0;
    +    // If a : <b,c> and d : <e,f> are two struct and if a.b,a.c,d.e is given in the
    +    // projection list,then object array will contain a,null,d as result, because for a.b,
    +    // a will be filled and for a.c null will be placed.
    +    // Instead place null in the end of object array and send a,d,null as result.
    +    for (int j = 0; j < row.length; j++) {
    +      if (null == row[j] && !isComplexChildColumn[j]) {
    +        // if it is a primitive column, don't shift the null to the end.
    +        row[count++] = null;
    +        isComplexChildColumn[dataTypeCount++] = false;
    +      } else if (null != row[j]) {
    +        row[count++] = row[j];
    +        isComplexChildColumn[dataTypeCount++] = isComplexChildColumn[j];
    +      }
    +    }
    +    // fill the skipped content
    +    while (count < row.length) row[count++] = null;
    +    while (dataTypeCount < isComplexChildColumn.length) {
    +      isComplexChildColumn[dataTypeCount++] = true;
    --- End diff --
    
    Why this filling of array is required here? are you using this array in caller.


---