You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/01 15:42:19 UTC

[GitHub] [arrow] awildturtok edited a comment on pull request #7214: ARROW-8842: [Java] fix ListVector's setValueCount to set inner vector's value count correctly

awildturtok edited a comment on pull request #7214:
URL: https://github.com/apache/arrow/pull/7214#issuecomment-852158288


   Ahoi,
   
   we are experiencing issues that we feel is related to this PR/a workaround we are trying to do basically the following:
   
   ```{java}
   private static RowConsumer listVectorFiller(ListVector vector, int RowNumber, List<String> values){
       // Values is a vertical list
   
       int start = vector.startNewValue(rowNumber);
       final FieldVector innerVector = vector.getDataVector();
   
       for (int i = 0; i < values.size(); i++) {
           String value = values.get(i);
           innerVector.setSafe(start + i, new Text(value));        
       }
   
       // Workaround for https://issues.apache.org/jira/browse/ARROW-8842
       int valueCount = innerVector.getValueCount();
       innerVector.setValueCount(valueCount + values.size()); // ie grow the innerVector by the inner values
   
       vector.endValue(rowNumber,values.size());
   }
   ```
   
   We are generating an Arrow file that rendered as CSV has  34MB, but as arrs/arrf comes in a 1GB also taking quite long to generate. Since it also contains a crude fix for this PR we wanted to make sure if this is the proper way of creating a ListVector.
   
   I've attached a flamegraph of the generation: [arrow-download.svg.zip](https://github.com/apache/arrow/files/6576795/arrow-download.svg.zip)
   
   After reading then writing the file again using python, the file is only 11MB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org