You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by ppadma <gi...@git.apache.org> on 2018/03/16 04:54:54 UTC

[GitHub] drill pull request #1171: DRILL-6231: Fix memory allocation for repeated lis...

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/1171

    DRILL-6231: Fix memory allocation for repeated list vector

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-6231

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1171
    
----
commit 114a526240a272bd626862a02d1acd06b9c8b4cf
Author: Padma Penumarthy <pp...@...>
Date:   2018-03-16T04:50:54Z

    DRILL-6231: Fix memory allocation for repeated list vector

----


---

[GitHub] drill issue #1171: DRILL-6231: Fix memory allocation for repeated list vecto...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1171
  
    @paul-rogers I added the null check. Also, added a test case for 3D array i.e. repeated repeated list.


---

[GitHub] drill issue #1171: DRILL-6231: Fix memory allocation for repeated list vecto...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1171
  
    @paul-rogers please review.


---

[GitHub] drill pull request #1171: DRILL-6231: Fix memory allocation for repeated lis...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1171


---

[GitHub] drill pull request #1171: DRILL-6231: Fix memory allocation for repeated lis...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1171#discussion_r175244575
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -395,11 +395,24 @@ private void allocateMap(AbstractMapVector map, int recordCount) {
           }
         }
     
    +    private void allocateRepeatedList(RepeatedListVector vector, int recordCount) {
    +      vector.allocateOffsetsNew(recordCount);
    +      recordCount *= getCardinality();
    +      ColumnSize child = children.get(vector.getField().getName());
    +      child.allocateVector(vector.getDataVector(), recordCount);
    --- End diff --
    
    One interesting feature of this vector is that the child can be null during reading for some time. That is, in JSON, we may see that the field is `foo: [[]]`, but don't know the inner type yet. So, for safety, allocate the inner vector only if `vector.getDataVector()` is non-null.
    
    Also note that a repeated list can be of any dimension. So, the inner vector can be another repeated list of lesser dimension. The code here handles that case. But, does the sizer itself handle nested repeated lists? Do we have a unit test for a 2D and 3D list?
    
    Never had to do these before because only JSON can produce such structures and we don't seem to exercise most operators with complex JSON structures. We probably should.


---