You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Boris V.Kuznetsov (JIRA)" <ji...@apache.org> on 2019/08/05 09:05:00 UTC
[jira] [Updated] (ARROW-6133) Schema Missing Exception in
ArrowStreamReader
[ https://issues.apache.org/jira/browse/ARROW-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Boris V.Kuznetsov updated ARROW-6133:
-------------------------------------
Description:
Hello
My colleague and I are trying to pass Arrow thru Kafka. He uses a PyArrow, I'm using Scala Java API.
Here's the Transmitter code:
```python
import pyarrow as pa
def record_batch_to_bytes(df):
batch = pa.RecordBatch.from_pandas(df)
ser_ = pa.serialize(batch)
return bytes(ser_.to_buffer())
```
My colleague is able to read this stream with the Python API:
```python
def bytes_to_batch_record(bytes_):
batch = pa.deserialize(bytes_)
print(batch.schema)
```
On the Receiver side, I use the following from Java API:
```java
def deserialize(din: Chunk[BArr]): Chunk[ArrowStreamReader] =
for {
arr <- din
stream = new ByteArrayInputStream(arr)
} yield new ArrowStreamReader(stream, allocator)
reader = deserialize(arr)
schema = reader.map(r => r.getVectorSchemaRoot.getSchema)
empty = reader.map(r => r.loadNextBatch)
```
Which fails with exception on both lines 2 and 3 in the last snippet:
Fiber failed.
An unchecked error was produced.
java.io.IOException: Unexpected end of input. Missing schema.
at org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:135)
at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:178)
at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:169)
at org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:62)
at nettest.ArrowSpec.$anonfun$testConsumeArrow$7(Arrow.scala:96)
at zio.Chunk$Arr.map(Chunk.scala:722)
The full Scala code is [here|https://github.com/Clover-Group/zio-tsp/blob/46e34c7c060bf4061067922077bbe05ea4b9f301/src/test/scala/Arrow.scala#L95]
How do I resolve that ? We both are using Arrow 0.14.1 and my colleague has no issues with PyArrow API.
Thank you!
was:
Hello
My colleague and I are trying to pass Arrow thru Kafka. He uses a PyArrow, I'm using Scala Java API.
Here's the Transmitter code:
import pyarrow as pa
def record_batch_to_bytes(df):
batch = pa.RecordBatch.from_pandas(df)
ser_ = pa.serialize(batch)
return bytes(ser_.to_buffer())
My colleague is able to read this stream with the Python API:
def bytes_to_batch_record(bytes_):
batch = pa.deserialize(bytes_)
print(batch.schema)
On the Receiver side, I use the following from Java API:
{color:#569cd6}def{color} {color:#dcdcaa}deserialize{color}{color:#d4d4d4}({color}{color:#9cdcfe}din{color}{color:#d4d4d4}: {color}{color:#4ec9b0}Chunk{color}{color:#d4d4d4}[{color}{color:#4ec9b0}BArr{color}{color:#d4d4d4}]){color}{color:#d4d4d4}:{color} {color:#4ec9b0}Chunk{color}{color:#d4d4d4}[{color}{color:#4ec9b0}ArrowStreamReader{color}{color:#d4d4d4}] {color}{color:#d4d4d4}={color}
{color:#c586c0}for{color}{color:#d4d4d4} {{color}
{color:#d4d4d4} arr {color}{color:#d4d4d4}<-{color}{color:#d4d4d4} din{color}
{color:#d4d4d4} stream {color}{color:#d4d4d4}={color} {color:#569cd6}new{color} {color:#4ec9b0}ByteArrayInputStream{color}{color:#d4d4d4}(arr){color}
{color:#d4d4d4} } {color}{color:#c586c0}yield{color} {color:#569cd6}new{color} {color:#4ec9b0}ArrowStreamReader{color}{color:#d4d4d4}(stream, allocator){color}
{color:#d4d4d4}reader {color}{color:#d4d4d4}={color}{color:#d4d4d4} deserialize(arr){color}
{color:#d4d4d4}schema {color}{color:#d4d4d4}={color}{color:#d4d4d4} reader.map(r {color}{color:#d4d4d4}=>{color}{color:#d4d4d4} r.getVectorSchemaRoot.getSchema){color}
{color:#d4d4d4}empty {color}{color:#d4d4d4}={color}{color:#d4d4d4} reader.map(r {color}{color:#d4d4d4}=>{color}{color:#d4d4d4} r.loadNextBatch){color}
Which fails with exception on both lines 2 and 3 in the last snippet:
Fiber failed.
An unchecked error was produced.
java.io.IOException: Unexpected end of input. Missing schema.
at org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:135)
at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:178)
at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:169)
at org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:62)
at nettest.ArrowSpec.$anonfun$testConsumeArrow$7(Arrow.scala:96)
at zio.Chunk$Arr.map(Chunk.scala:722)
The full Scala code is [here|https://github.com/Clover-Group/zio-tsp/blob/46e34c7c060bf4061067922077bbe05ea4b9f301/src/test/scala/Arrow.scala#L95]
How do I resolve that ? We both are using Arrow 0.14.1 and my colleague has no issues with PyArrow API.
Thank you!
> Schema Missing Exception in ArrowStreamReader
> ---------------------------------------------
>
> Key: ARROW-6133
> URL: https://issues.apache.org/jira/browse/ARROW-6133
> Project: Apache Arrow
> Issue Type: Bug
> Components: Java
> Affects Versions: 0.14.1
> Reporter: Boris V.Kuznetsov
> Priority: Major
>
> Hello
> My colleague and I are trying to pass Arrow thru Kafka. He uses a PyArrow, I'm using Scala Java API.
> Here's the Transmitter code:
> ```python
> import pyarrow as pa
> def record_batch_to_bytes(df):
> batch = pa.RecordBatch.from_pandas(df)
> ser_ = pa.serialize(batch)
> return bytes(ser_.to_buffer())
> ```
>
> My colleague is able to read this stream with the Python API:
> ```python
> def bytes_to_batch_record(bytes_):
> batch = pa.deserialize(bytes_)
> print(batch.schema)
> ```
> On the Receiver side, I use the following from Java API:
> ```java
> def deserialize(din: Chunk[BArr]): Chunk[ArrowStreamReader] =
> for {
> arr <- din
> stream = new ByteArrayInputStream(arr)
> } yield new ArrowStreamReader(stream, allocator)
>
> reader = deserialize(arr)
> schema = reader.map(r => r.getVectorSchemaRoot.getSchema)
> empty = reader.map(r => r.loadNextBatch)
> ```
>
> Which fails with exception on both lines 2 and 3 in the last snippet:
> Fiber failed.
> An unchecked error was produced.
> java.io.IOException: Unexpected end of input. Missing schema.
> at org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:135)
> at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:178)
> at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:169)
> at org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:62)
> at nettest.ArrowSpec.$anonfun$testConsumeArrow$7(Arrow.scala:96)
> at zio.Chunk$Arr.map(Chunk.scala:722)
>
> The full Scala code is [here|https://github.com/Clover-Group/zio-tsp/blob/46e34c7c060bf4061067922077bbe05ea4b9f301/src/test/scala/Arrow.scala#L95]
>
> How do I resolve that ? We both are using Arrow 0.14.1 and my colleague has no issues with PyArrow API.
> Thank you!
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)