You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Benoit Cantin (Jira)" <ji...@apache.org> on 2022/07/19 13:31:00 UTC
[jira] [Updated] (ARROW-17123) [JS] Unable to open reader on .arrow file after fetch: Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123.

     [ https://issues.apache.org/jira/browse/ARROW-17123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benoit Cantin updated ARROW-17123:
----------------------------------
    Description: 
I created a file in raw arrow format with the script given in the Py arrow cookbook here: [https://arrow.apache.org/cookbook/py/io.html#saving-arrow-arrays-to-disk]

 

In a Node.js application, this file can be read doing:

 
{code:java}
const r = await RecordBatchReader.from(fs.createReadStream(filePath));                
await r.open();

for (let i = 0; i < r.numRecordBatches; i++) {
    const rb = await r.readRecordBatch(i); 
    if (rb !== null) {
        console.log(rb.numRows);
    }
} {code}
However this method loads the whole file in memory (is that a bug?), which is not scalable.

 

To solve this scalability issue, I try to load the data with fetch as described in the the [README.md|#load-data-with-fetch].] Both:

 
{code:java}
import { tableFromIPC } from "apache-arrow";

const table = await tableFromIPC(fetch(filePath));
console.table([...table]);{code}
 and

 

 
{code:java}
const r = await RecordBatchReader.from(await fetch(filePath));                await r.open(); {code}
fail with error:

Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123.

 

  was:
I created a file in raw arrow format with the script given in the Py arrow cookbook here: [https://arrow.apache.org/cookbook/py/io.html#saving-arrow-arrays-to-disk]

 

In a Node.js application, this file can be read doing:

 
{code:java}
const r = await RecordBatchReader.from(fs.createReadStream(filePath));                await r.open();
for (let i = 0; i < r.numRecordBatches; i++) {
    const rb = await r.readRecordBatch(i); 
    if (rb !== null) {
        console.log(rb.numRows);
    }
} {code}
However this method loads the whole file in memory (is that a bug?), which is not scalable.

 

To solve this scalability issue, I try to load the data with fetch as described in the the [README.md|[https://github.com/apache/arrow/tree/master/js#load-data-with-fetch].] Both:

 
{code:java}
import { tableFromIPC } from "apache-arrow";

const table = await tableFromIPC(fetch(filePath));
console.table([...table]);{code}
 and

 

 
{code:java}
const r = await RecordBatchReader.from(await fetch(filePath));                await r.open(); {code}
fail with error:

Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123.

 


> [JS] Unable to open reader on .arrow file after fetch: Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123.
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17123
>                 URL: https://issues.apache.org/jira/browse/ARROW-17123
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>    Affects Versions: 8.0.1
>            Reporter: Benoit Cantin
>            Priority: Major
>
> I created a file in raw arrow format with the script given in the Py arrow cookbook here: [https://arrow.apache.org/cookbook/py/io.html#saving-arrow-arrays-to-disk]
>  
> In a Node.js application, this file can be read doing:
>  
> {code:java}
> const r = await RecordBatchReader.from(fs.createReadStream(filePath));                
> await r.open();
> for (let i = 0; i < r.numRecordBatches; i++) {
>     const rb = await r.readRecordBatch(i); 
>     if (rb !== null) {
>         console.log(rb.numRows);
>     }
> } {code}
> However this method loads the whole file in memory (is that a bug?), which is not scalable.
>  
> To solve this scalability issue, I try to load the data with fetch as described in the the [README.md|#load-data-with-fetch].] Both:
>  
> {code:java}
> import { tableFromIPC } from "apache-arrow";
> const table = await tableFromIPC(fetch(filePath));
> console.table([...table]);{code}
>  and
>  
>  
> {code:java}
> const r = await RecordBatchReader.from(await fetch(filePath));                await r.open(); {code}
> fail with error:
> Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)