You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Denis Gursky (Jira)" <ji...@apache.org> on 2022/10/12 16:06:00 UTC

[jira] [Created] (ARROW-18007) [JS] Values returned as undefined when arrow file bigger than 2gb

Denis Gursky created ARROW-18007:
------------------------------------

             Summary: [JS] Values returned as undefined when arrow file bigger than 2gb
                 Key: ARROW-18007
                 URL: https://issues.apache.org/jira/browse/ARROW-18007
             Project: Apache Arrow
          Issue Type: Bug
          Components: JavaScript
            Reporter: Denis Gursky


Steps:

 

1. Generate arrow file bigger than 2gb
{code:java}
import pyarrow as pa

nums1 = [42]
nums2 = [42.42]
mil = 1000000

for n in range(1, 140 * mil):
  nums1.append(n)
  nums2.append(1 / n)

arr1 = pa.array(nums1)
arr2 = pa.array(nums2)

schema = pa.schema([
  pa.field('nums1', arr1.type),
  pa.field('nums2', arr2.type),
])

with pa.OSFile('arraydata.arrow', 'wb') as sink:
  with pa.ipc.new_file(sink, schema=schema) as writer:
    batch = pa.record_batch([arr1, arr2], schema=schema)
    writer.write(batch) {code}
2. Try to read it via the JS SDK
{code:java}
const fs = require("fs");
const { tableFromIPC, RecordBatchReader } = require("apache-arrow");

const filePath = "./arraydata.arrow";

const stream = fs.createReadStream(filePath);
const reader = RecordBatchReader.from(stream);

(async function () {
  const table = await tableFromIPC(reader);

  console.log("numRows", table.numRows);
  console.log("first row", table.get(0).toArray());
})(); {code}
The code above prints:
{code:java}
numRows 140000000
first row [ undefined, undefined ] {code}
{{numRows}} is correct, but the values are coming out as {{{}undefined{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)