You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/20 09:06:39 UTC

[GitHub] [arrow] tianchen92 opened a new pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

tianchen92 opened a new pull request #7231:
URL: https://github.com/apache/arrow/pull/7231


   Related to [ARROW-6839](https://issues.apache.org/jira/browse/ARROW-6839).
   
   This is the java implementation of format change in ARROW-6836.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on a change in pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#discussion_r429044815



##########
File path: java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
##########
@@ -751,4 +753,52 @@ public void testChannelReadFullyEos() throws IOException {
       assertEquals(10, arrBuf.getInt(0));
     }
   }
+
+  @Test
+  public void testCustomMetaData() throws IOException {
+    DictionaryProvider.MapDictionaryProvider provider = new DictionaryProvider.MapDictionaryProvider();
+    provider.put(dictionary1);
+
+    VarCharVector vector1 = newVarCharVector("varchar1", allocator);
+    vector1.allocateNewSafe();
+    vector1.set(0, "foo".getBytes(StandardCharsets.UTF_8));
+    vector1.set(1, "bar".getBytes(StandardCharsets.UTF_8));
+    vector1.set(3, "baz".getBytes(StandardCharsets.UTF_8));
+    vector1.set(4, "bar".getBytes(StandardCharsets.UTF_8));
+    vector1.set(5, "baz".getBytes(StandardCharsets.UTF_8));
+    vector1.setValueCount(6);
+    FieldVector encodedVector1 = (FieldVector) DictionaryEncoder.encode(vector1, dictionary1);

Review comment:
       >is dictionary encoding really necessary for this test?
   
   Not necessary, removed, thanks.
   
   >Can you use the VectorPopulator is any data is actually needed?
   
   Hmm, always forget this, fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#discussion_r429028165



##########
File path: java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
##########
@@ -751,4 +753,52 @@ public void testChannelReadFullyEos() throws IOException {
       assertEquals(10, arrBuf.getInt(0));
     }
   }
+
+  @Test
+  public void testCustomMetaData() throws IOException {
+    DictionaryProvider.MapDictionaryProvider provider = new DictionaryProvider.MapDictionaryProvider();
+    provider.put(dictionary1);
+
+    VarCharVector vector1 = newVarCharVector("varchar1", allocator);
+    vector1.allocateNewSafe();
+    vector1.set(0, "foo".getBytes(StandardCharsets.UTF_8));
+    vector1.set(1, "bar".getBytes(StandardCharsets.UTF_8));
+    vector1.set(3, "baz".getBytes(StandardCharsets.UTF_8));
+    vector1.set(4, "bar".getBytes(StandardCharsets.UTF_8));
+    vector1.set(5, "baz".getBytes(StandardCharsets.UTF_8));
+    vector1.setValueCount(6);
+    FieldVector encodedVector1 = (FieldVector) DictionaryEncoder.encode(vector1, dictionary1);

Review comment:
       is dictionary encoding really necessary for this test?  Can you use the VectorPopulator is any data is actually needed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-644556734


   @tianchen92 did you use the merge tool to merge this (if you haven't please read https://cwiki.apache.org/confluence/display/ARROW/Guide+for+Committers+and+Project+Maintainers)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-644557502


   > @tianchen92 did you use the merge tool to merge this (if you haven't please read https://cwiki.apache.org/confluence/display/ARROW/Guide+for+Committers+and+Project+Maintainers)
   
   Hmm, I directly merged this by click "squash and merge" button, should we revert this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on a change in pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#discussion_r433729111



##########
File path: java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
##########
@@ -751,4 +754,43 @@ public void testChannelReadFullyEos() throws IOException {
       assertEquals(10, arrBuf.getInt(0));
     }
   }
+
+  @Test
+  public void testCustomMetaData() throws IOException {
+
+    VarCharVector vector = newVarCharVector("varchar1", allocator);
+    vector.allocateNewSafe();
+    ValueVectorDataPopulator.setVector(vector, "foo", "bar", "baz");

Review comment:
       ok, no special reason, made it look like a normal file data. Removed now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-644734838


   The commit looks OK but @tianchen92 please use the merge tool not the GitHub UI to merge patches. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-644543412


   No please go ahead and merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on a change in pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#discussion_r433729367



##########
File path: java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
##########
@@ -751,4 +754,43 @@ public void testChannelReadFullyEos() throws IOException {
       assertEquals(10, arrBuf.getInt(0));
     }
   }
+
+  @Test
+  public void testCustomMetaData() throws IOException {
+
+    VarCharVector vector = newVarCharVector("varchar1", allocator);
+    vector.allocateNewSafe();
+    ValueVectorDataPopulator.setVector(vector, "foo", "bar", "baz");
+
+    List<Field> fields = Arrays.asList(vector.getField());
+    List<FieldVector> vectors = Collections2.asImmutableList(vector);
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("key1", "value1");
+    metadata.put("key2", "value2");
+    try (VectorSchemaRoot root = new VectorSchemaRoot(fields, vectors, vector.getValueCount());
+        ByteArrayOutputStream out = new ByteArrayOutputStream();
+        ArrowFileWriter writer = new ArrowFileWriter(root, null, newChannel(out), metadata);) {
+
+      writer.start();
+      writer.writeBatch();

Review comment:
       also removed, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-643892709


   Thanks @rymurr for the review.
   @emkornfield Do you have other comments? otherwise I'll merge this in several days later. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-631351345


   https://issues.apache.org/jira/browse/ARROW-6839


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on a change in pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#discussion_r433727117



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileReader.java
##########
@@ -112,6 +113,16 @@ public void initialize() throws IOException {
     }
   }
 
+  /**
+   * Get custom metadata.
+   */
+  public Map<String, String> getMetaData() {
+    if (footer != null) {
+      return footer.getMetaData();
+    }
+    return null;

Review comment:
       ok, fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-644741925


   > The commit looks OK but @tianchen92 please use the merge tool not the GitHub UI to merge patches.
   
   I see, sorry for the incorrect merge operation :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-631361815


   @wesm @jacques-n @emkornfield Please help take a look when you have time, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#discussion_r433645139



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowFooter.java
##########
@@ -96,17 +126,40 @@ public Schema getSchema() {
     return recordBatches;
   }
 
+  public Map<String, String> getMetaData() {
+    return metaData;
+  }
+
   @Override
   public int writeTo(FlatBufferBuilder builder) {
     int schemaIndex = schema.getSchema(builder);
     Footer.startDictionariesVector(builder, dictionaries.size());
     int dicsOffset = writeAllStructsToVector(builder, dictionaries);
     Footer.startRecordBatchesVector(builder, recordBatches.size());
     int rbsOffset = writeAllStructsToVector(builder, recordBatches);
+
+    int metaDataOffset = 0;
+    if (metaData != null) {
+      int[] metadataOffsets = new int[metaData.size()];

Review comment:
       i'm suprised this code doesn't exist else where for converted meta between string,string map

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileReader.java
##########
@@ -112,6 +113,16 @@ public void initialize() throws IOException {
     }
   }
 
+  /**
+   * Get custom metadata.
+   */
+  public Map<String, String> getMetaData() {
+    if (footer != null) {
+      return footer.getMetaData();
+    }
+    return null;

Review comment:
       is this consistent with other methods, I think returning an empty map might be better?

##########
File path: java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
##########
@@ -751,4 +754,43 @@ public void testChannelReadFullyEos() throws IOException {
       assertEquals(10, arrBuf.getInt(0));
     }
   }
+
+  @Test
+  public void testCustomMetaData() throws IOException {
+
+    VarCharVector vector = newVarCharVector("varchar1", allocator);
+    vector.allocateNewSafe();
+    ValueVectorDataPopulator.setVector(vector, "foo", "bar", "baz");

Review comment:
       are any values needed?  If so i imagine just one is needed?

##########
File path: java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java
##########
@@ -751,4 +754,43 @@ public void testChannelReadFullyEos() throws IOException {
       assertEquals(10, arrBuf.getInt(0));
     }
   }
+
+  @Test
+  public void testCustomMetaData() throws IOException {
+
+    VarCharVector vector = newVarCharVector("varchar1", allocator);
+    vector.allocateNewSafe();
+    ValueVectorDataPopulator.setVector(vector, "foo", "bar", "baz");
+
+    List<Field> fields = Arrays.asList(vector.getField());
+    List<FieldVector> vectors = Collections2.asImmutableList(vector);
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("key1", "value1");
+    metadata.put("key2", "value2");
+    try (VectorSchemaRoot root = new VectorSchemaRoot(fields, vectors, vector.getValueCount());
+        ByteArrayOutputStream out = new ByteArrayOutputStream();
+        ArrowFileWriter writer = new ArrowFileWriter(root, null, newChannel(out), metadata);) {
+
+      writer.start();
+      writer.writeBatch();

Review comment:
       does a batch even need to be written?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 commented on a change in pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#discussion_r433727922



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowFooter.java
##########
@@ -96,17 +126,40 @@ public Schema getSchema() {
     return recordBatches;
   }
 
+  public Map<String, String> getMetaData() {
+    return metaData;
+  }
+
   @Override
   public int writeTo(FlatBufferBuilder builder) {
     int schemaIndex = schema.getSchema(builder);
     Footer.startDictionariesVector(builder, dictionaries.size());
     int dicsOffset = writeAllStructsToVector(builder, dictionaries);
     Footer.startRecordBatchesVector(builder, recordBatches.size());
     int rbsOffset = writeAllStructsToVector(builder, recordBatches);
+
+    int metaDataOffset = 0;
+    if (metaData != null) {
+      int[] metadataOffsets = new int[metaData.size()];

Review comment:
       Actually similar logic exist in Schema for writing metadata, I extracted a static write method in FBSerializables for reuse purpose.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-632422466


   cc @lidavidm also


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tianchen92 merged pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
tianchen92 merged pull request #7231:
URL: https://github.com/apache/arrow/pull/7231


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-644564824


   Not familiar enough with the implications @kszucs @kou what are the correct steps here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org