You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "freakyzoidberg (via GitHub)" <gi...@apache.org> on 2023/09/23 14:22:16 UTC
[GitHub] [arrow] freakyzoidberg opened a new issue, #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader
freakyzoidberg opened a new issue, #37841:
URL: https://github.com/apache/arrow/issues/37841
### Describe the bug, including details regarding any error messages, version, and platform.
I am trying to decode in Java records generated in Go (simple type + dictionaries) using ZSTD compression
Although this is working fine for the simple types, I am getting this error when decoding dictionaries
```
java.lang.IllegalArgumentException: Please add arrow-compression module to use CommonsCompressionFactory for ZSTD
at org.apache.arrow.vector.compression.NoCompressionCodec$Factory.createCodec(NoCompressionCodec.java:69)
at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:82)
at org.apache.arrow.vector.ipc.ArrowReader.load(ArrowReader.java:256)
at org.apache.arrow.vector.ipc.ArrowReader.loadDictionary(ArrowReader.java:247)
at org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:167)
```
The Go part is essentially
```go
dtyp := &arrow.DictionaryType{
IndexType: arrow.PrimitiveTypes.Int8,
ValueType: arrow.BinaryTypes.LargeString,
}
bldrDictString := arrowarray.NewDictionaryBuilder(memory.DefaultAllocator, dtyp)
defer bldrDictString.Release()
bldrDictString.(*arrowarray.BinaryDictionaryBuilder).AppendString("foo")
columnTypes := make([]arrow.Field, 0, 1)
columnArrays := make([]arrow.Array, 0, 1)
columnArrays = append(columnArrays, bldrDictString.NewArray())
columnTypes = append(columnTypes, arrow.Field{Name: k.key, Type: dtyp, Nullable: nulls.Any()})
schema := arrow.NewSchema(columnTypes, nil)
rec := arrowarray.NewRecord(schema, columnArrays, int64(size))
var buf bytes.Buffer
writer := ipc.NewWriter(&buf, ipc.WithSchema(schema), ipc.WithZstd())
err := writer.Write(rec)
err = writer.Close()
```
And the Java side
```java
import org.apache.arrow.compression.CommonsCompressionFactory;
try (ArrowStreamReader reader =
new ArrowStreamReader(
new ByteArrayInputStream(format.getArrow().toByteArray()),
bufferAllocator,
CommonsCompressionFactory.INSTANCE)) {
reader.loadNextBatch();
...
} catch (IOException e) {
throw new RuntimeException(e);
}
```
I am able to get it to not throw by making the VectorLoader used when loading the dictionary use the compression factory defined in the reader (it is currently defaulting to NoCompression)
see this [change](https://github.com/freakyzoidberg/arrow/commit/f945d2ddee9c332661d3d97084c2aedb56f7fcf5), note I was not able to make it fail using the java arrow test.
I am probably doing something wrong, and also wondering if dictionaries are compressed the same in go and java writers which could explain why the java test is not failing ?
Anyhow, unless I am doing something wrong, this looks like a bug.
Thanks !
### Component(s)
Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on issue #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader
Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #37841:
URL: https://github.com/apache/arrow/issues/37841#issuecomment-1732418676
CC @davisusanibar @vibhatha
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]
Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm closed issue #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader
URL: https://github.com/apache/arrow/issues/37841
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] vibhatha commented on issue #37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader
Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on issue #37841:
URL: https://github.com/apache/arrow/issues/37841#issuecomment-1732459818
@lidavidm I will take look.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org