You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/15 12:54:01 UTC

[GitHub] [arrow] subbarao opened a new issue #7968: VectorSchemaRoot reuse when using dictionary encoded field

subbarao opened a new issue #7968:
URL: https://github.com/apache/arrow/issues/7968


   From code example given in https://enpiar.com/arrow-site/docs/java/ipc.html
   ```
   DictionaryProvider.MapDictionaryProvider provider = new DictionaryProvider.MapDictionaryProvider();
   // create dictionary and provider
   final VarCharVector dictVector = new VarCharVector("dict", allocator);
   dictVector.allocateNewSafe();
   dictVector.setSafe(0, "aa".getBytes());
   dictVector.setSafe(1, "bb".getBytes());
   dictVector.setSafe(2, "cc".getBytes());
   dictVector.setValueCount(3);
   
   Dictionary dictionary =
       new Dictionary(dictVector, new DictionaryEncoding(1L, false, /*indexType=*/null));
   provider.put(dictionary);
   
   // create vector and encode it
   final VarCharVector vector = new VarCharVector("vector", allocator);
   vector.allocateNewSafe();
   vector.setSafe(0, "bb".getBytes());
   vector.setSafe(1, "bb".getBytes());
   vector.setSafe(2, "cc".getBytes());
   vector.setSafe(3, "aa".getBytes());
   vector.setValueCount(4);
   
   // get the encoded vector
   IntVector encodedVector = (IntVector) DictionaryEncoder.encode(vector, dictionary);
   
   // create VectorSchemaRoot
   List<Field> fields = Arrays.asList(encodedVector.getField());
   List<FieldVector> vectors = Arrays.asList(encodedVector);
   VectorSchemaRoot root = new VectorSchemaRoot(fields, vectors);
   ```
   VectorSchemaRoot is immutable if i want to write in batches.
   ```DictionaryEncoder.encode``` api always giving new vector. it is not operating on existing vector. 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed issue #7968: VectorSchemaRoot reuse when using dictionary encoded field

Posted by GitBox <gi...@apache.org>.
wesm closed issue #7968:
URL: https://github.com/apache/arrow/issues/7968


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #7968: VectorSchemaRoot reuse when using dictionary encoded field

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #7968:
URL: https://github.com/apache/arrow/issues/7968#issuecomment-674571250


   Can you direct this question to the dev@ or user@ mailing list (or open a JIRA if you think you have found a bug or missing feature)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org