You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "geert-jan brits (Jira)" <ji...@apache.org> on 2021/12/13 20:46:00 UTC

[jira] [Updated] (ARROW-15086) [JS] Incorrect Table concat after serialize

     [ https://issues.apache.org/jira/browse/ARROW-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

geert-jan brits updated ARROW-15086:
------------------------------------
    Description: 
After reading an arrow table from disk and concatenating another arrow table, this concatenated table loses it's last concatenated chunk after serializing to disk. 

 

Steps to reproduce below: 
{code:java}
// CORRECT
// Create tables T1 + T2. Each contain 1 vector with 1 value
    
const T1 = Table.new(createSomeVectors(), ['number'])    
const T2 = Table.new(createSomeVectors(), ['number'])

// Combine these tables    const combined = T1.concat(T2)
// Serialize and read back this combination (mimic reading from disk)    

const combinedAfterSerialization = Table.from([combined.serialize()])

// Print the count (works correctly)    
console.log(T1.count(), T2.count(), combined.count(), combinedAfterSerialization.count()) 
// Result (as expected)= 2, 2, 2, 2 


// INCORRECT        
// Serialize T1 and read back. (mimic reading from disk)    
const T1SerializedAndBack = Table.from([T1.serialize()])

// Combine just read T1SerializedAndBack with T2    
const combined2 = T1SerializedAndBack.concat(T2)

// Serialize and read back this combination (mimic reading from disk)    
const combinedAfterSerialization2 = Table.from([combined2.serialize()])

// Print the count (works Incorrectly)    
console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(), combinedAfterSerialization2.count()) 
// Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
 

  was:
After reading an arrow table from disk and concatenating another arrow table, this concatenated table loses it's last concatenated chunk after serializing to disk. 

 

Steps to reproduce below: 
{code:java}
/////////////////    // CORRECT
    // Create tables T1 + T2. Each contain 1 vector with 1 value    const T1 = Table.new(createSomeVectors(), ['number'])    const T2 = Table.new(createSomeVectors(), ['number'])
    // Combine these tables    const combined = T1.concat(T2)
    // Serialize and read back this combination (mimic reading from disk)    const combinedAfterSerialization = Table.from([combined.serialize()])
    // Print the count (works correctly)    console.log(T1.count(), T2.count(), combined.count(), combinedAfterSerialization.count()) // Result (as expected)= 2, 2, 2, 2 
        /////////////////    // INCORRECT        // Serialize T1 and read back. (mimic reading from disk)    const T1SerializedAndBack = Table.from([T1.serialize()])
    // Combine just read T1SerializedAndBack with T2    const combined2 = T1SerializedAndBack.concat(T2)
    // Serialize and read back this combination (mimic reading from disk)    const combinedAfterSerialization2 = Table.from([combined2.serialize()])
    // Print the count (works correctly)    console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(), combinedAfterSerialization2.count()) // Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
 


> [JS] Incorrect Table concat after serialize
> -------------------------------------------
>
>                 Key: ARROW-15086
>                 URL: https://issues.apache.org/jira/browse/ARROW-15086
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>    Affects Versions: 6.0.1
>            Reporter: geert-jan brits
>            Priority: Major
>
> After reading an arrow table from disk and concatenating another arrow table, this concatenated table loses it's last concatenated chunk after serializing to disk. 
>  
> Steps to reproduce below: 
> {code:java}
> // CORRECT
> // Create tables T1 + T2. Each contain 1 vector with 1 value
>     
> const T1 = Table.new(createSomeVectors(), ['number'])    
> const T2 = Table.new(createSomeVectors(), ['number'])
> // Combine these tables    const combined = T1.concat(T2)
> // Serialize and read back this combination (mimic reading from disk)    
> const combinedAfterSerialization = Table.from([combined.serialize()])
> // Print the count (works correctly)    
> console.log(T1.count(), T2.count(), combined.count(), combinedAfterSerialization.count()) 
> // Result (as expected)= 2, 2, 2, 2 
> // INCORRECT        
> // Serialize T1 and read back. (mimic reading from disk)    
> const T1SerializedAndBack = Table.from([T1.serialize()])
> // Combine just read T1SerializedAndBack with T2    
> const combined2 = T1SerializedAndBack.concat(T2)
> // Serialize and read back this combination (mimic reading from disk)    
> const combinedAfterSerialization2 = Table.from([combined2.serialize()])
> // Print the count (works Incorrectly)    
> console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(), combinedAfterSerialization2.count()) 
> // Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)