You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "freakyzoidberg (via GitHub)" <gi...@apache.org> on 2023/09/23 14:22:16 UTC

[GitHub] [arrow] freakyzoidberg opened a new issue, #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader

freakyzoidberg opened a new issue, #37841:
URL: https://github.com/apache/arrow/issues/37841

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   I am trying to decode  in Java records generated in Go (simple type + dictionaries) using ZSTD compression 
   
   Although this is working fine for the simple types, I am getting this error when decoding dictionaries
   
   ```
   java.lang.IllegalArgumentException: Please add arrow-compression module to use CommonsCompressionFactory for ZSTD
   	at org.apache.arrow.vector.compression.NoCompressionCodec$Factory.createCodec(NoCompressionCodec.java:69)
   	at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:82)
   	at org.apache.arrow.vector.ipc.ArrowReader.load(ArrowReader.java:256)
   	at org.apache.arrow.vector.ipc.ArrowReader.loadDictionary(ArrowReader.java:247)
   	at org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:167)
   ```
   
   
   The Go part is essentially
   
   ```go
   dtyp := &arrow.DictionaryType{
   	IndexType: arrow.PrimitiveTypes.Int8,
   	ValueType: arrow.BinaryTypes.LargeString,
   }
   bldrDictString := arrowarray.NewDictionaryBuilder(memory.DefaultAllocator, dtyp)
   defer bldrDictString.Release()
   
   bldrDictString.(*arrowarray.BinaryDictionaryBuilder).AppendString("foo")
   
   columnTypes := make([]arrow.Field, 0, 1)
   columnArrays := make([]arrow.Array, 0, 1)
   
   columnArrays = append(columnArrays, bldrDictString.NewArray())
   columnTypes = append(columnTypes, arrow.Field{Name: k.key, Type: dtyp, Nullable: nulls.Any()})
   
   schema := arrow.NewSchema(columnTypes, nil)
   rec := arrowarray.NewRecord(schema, columnArrays, int64(size))
   
   var buf bytes.Buffer
   writer := ipc.NewWriter(&buf, ipc.WithSchema(schema), ipc.WithZstd())
   err := writer.Write(rec)
   err = writer.Close()
   ```
   
   
   And the Java side
   
   ```java
   import org.apache.arrow.compression.CommonsCompressionFactory;
   
   
   try (ArrowStreamReader reader =
            new ArrowStreamReader(
                new ByteArrayInputStream(format.getArrow().toByteArray()),
                bufferAllocator,
                CommonsCompressionFactory.INSTANCE)) {
     reader.loadNextBatch();
     ...
   } catch (IOException e) {
     throw new RuntimeException(e);
   }
   ```
   
   
   I am able to get it to not throw by making the VectorLoader used when loading the dictionary use the compression factory defined in the reader (it is currently defaulting to NoCompression) 
   
   see this [change](https://github.com/freakyzoidberg/arrow/commit/f945d2ddee9c332661d3d97084c2aedb56f7fcf5), note I was not able to make it fail using the java arrow test.
   I am probably doing something wrong, and also wondering if dictionaries are compressed the same in go and java writers which could explain why the java test is not failing ?
   
   Anyhow, unless I am doing something wrong, this looks like a bug.
   
   Thanks !
   
   
   
   ### Component(s)
   
   Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on issue #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on issue #37841:
URL: https://github.com/apache/arrow/issues/37841#issuecomment-1732418676

   CC @davisusanibar @vibhatha 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm closed issue #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader 
URL: https://github.com/apache/arrow/issues/37841


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] vibhatha commented on issue #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.

vibhatha commented on issue #37841:
URL: https://github.com/apache/arrow/issues/37841#issuecomment-1732459818

   @lidavidm I will take look. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org