You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "geert-jan brits (Jira)" <ji...@apache.org> on 2021/12/13 20:46:00 UTC
[jira] [Updated] (ARROW-15086) [JS] Incorrect Table concat after serialize
[ https://issues.apache.org/jira/browse/ARROW-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
geert-jan brits updated ARROW-15086:
------------------------------------
Description:
After reading an arrow table from disk and concatenating another arrow table, this concatenated table loses it's last concatenated chunk after serializing to disk.
Steps to reproduce below:
{code:java}
// CORRECT
// Create tables T1 + T2. Each contain 1 vector with 1 value
const T1 = Table.new(createSomeVectors(), ['number'])
const T2 = Table.new(createSomeVectors(), ['number'])
// Combine these tables const combined = T1.concat(T2)
// Serialize and read back this combination (mimic reading from disk)
const combinedAfterSerialization = Table.from([combined.serialize()])
// Print the count (works correctly)
console.log(T1.count(), T2.count(), combined.count(), combinedAfterSerialization.count())
// Result (as expected)= 2, 2, 2, 2
// INCORRECT
// Serialize T1 and read back. (mimic reading from disk)
const T1SerializedAndBack = Table.from([T1.serialize()])
// Combine just read T1SerializedAndBack with T2
const combined2 = T1SerializedAndBack.concat(T2)
// Serialize and read back this combination (mimic reading from disk)
const combinedAfterSerialization2 = Table.from([combined2.serialize()])
// Print the count (works Incorrectly)
console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(), combinedAfterSerialization2.count())
// Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
was:
After reading an arrow table from disk and concatenating another arrow table, this concatenated table loses it's last concatenated chunk after serializing to disk.
Steps to reproduce below:
{code:java}
///////////////// // CORRECT
// Create tables T1 + T2. Each contain 1 vector with 1 value const T1 = Table.new(createSomeVectors(), ['number']) const T2 = Table.new(createSomeVectors(), ['number'])
// Combine these tables const combined = T1.concat(T2)
// Serialize and read back this combination (mimic reading from disk) const combinedAfterSerialization = Table.from([combined.serialize()])
// Print the count (works correctly) console.log(T1.count(), T2.count(), combined.count(), combinedAfterSerialization.count()) // Result (as expected)= 2, 2, 2, 2
///////////////// // INCORRECT // Serialize T1 and read back. (mimic reading from disk) const T1SerializedAndBack = Table.from([T1.serialize()])
// Combine just read T1SerializedAndBack with T2 const combined2 = T1SerializedAndBack.concat(T2)
// Serialize and read back this combination (mimic reading from disk) const combinedAfterSerialization2 = Table.from([combined2.serialize()])
// Print the count (works correctly) console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(), combinedAfterSerialization2.count()) // Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
> [JS] Incorrect Table concat after serialize
> -------------------------------------------
>
> Key: ARROW-15086
> URL: https://issues.apache.org/jira/browse/ARROW-15086
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript
> Affects Versions: 6.0.1
> Reporter: geert-jan brits
> Priority: Major
>
> After reading an arrow table from disk and concatenating another arrow table, this concatenated table loses it's last concatenated chunk after serializing to disk.
>
> Steps to reproduce below:
> {code:java}
> // CORRECT
> // Create tables T1 + T2. Each contain 1 vector with 1 value
>
> const T1 = Table.new(createSomeVectors(), ['number'])
> const T2 = Table.new(createSomeVectors(), ['number'])
> // Combine these tables const combined = T1.concat(T2)
> // Serialize and read back this combination (mimic reading from disk)
> const combinedAfterSerialization = Table.from([combined.serialize()])
> // Print the count (works correctly)
> console.log(T1.count(), T2.count(), combined.count(), combinedAfterSerialization.count())
> // Result (as expected)= 2, 2, 2, 2
> // INCORRECT
> // Serialize T1 and read back. (mimic reading from disk)
> const T1SerializedAndBack = Table.from([T1.serialize()])
> // Combine just read T1SerializedAndBack with T2
> const combined2 = T1SerializedAndBack.concat(T2)
> // Serialize and read back this combination (mimic reading from disk)
> const combinedAfterSerialization2 = Table.from([combined2.serialize()])
> // Print the count (works Incorrectly)
> console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(), combinedAfterSerialization2.count())
> // Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)