You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Saslawsky (Jira)" <ji...@apache.org> on 2020/10/31 09:25:00 UTC

[jira] [Created] (ARROW-10450) Table.fromStruct() silently truncates vectors to the first chunk

David Saslawsky created ARROW-10450:
---------------------------------------

             Summary: Table.fromStruct() silently truncates vectors to the first chunk
                 Key: ARROW-10450
                 URL: https://issues.apache.org/jira/browse/ARROW-10450
             Project: Apache Arrow
          Issue Type: Bug
          Components: JavaScript
    Affects Versions: 2.0.0
            Reporter: David Saslawsky


Table.fromStruct() only uses the first chunk from the input vector.
{code:javascript}
import { Bool, Field, Int32, Struct, Table, Vector } from "apache-arrow";

const myStruct = new Struct([
  Field.new({ name: "over", type: new Int32() }),
  Field.new({ name: "out", type: new Bool() })
]);

const data = [];
for(let i=0;i<1500;i++) {
  data.push({ over:i, out:i%2 === 0 });

// create a vector with two chunks
const victor = Vector.from({
  type: myStruct,
  /*highWaterMark: Infinity,*/
  values: data
});
console.log(victor.length);  // 1500 

const table = Table.fromStruct(victor);
console.log(table.length);   // 1000

{code}
 The workaround is to set highWaterMark to Infinity

 

Table.new() works as expected
{code:javascript}
const int32Array = new Int32Array(1500);for(let i=0;i<1500;i++)  int32Array[i] = i;
const intVector = Vector.from({  type: new Int32(),  values: int32Array});
console.log(intVector.length);  // 1500

 const intTable = Table.new({ intColumn:intVector });
console.log(intTable.length);   // 1500

{code}
 

The origin seems to be in Chunked.data() but I don't understand the code enough to propose a fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)