You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Gert Hulselmans (Jira)" <ji...@apache.org> on 2021/06/23 16:23:00 UTC

[jira] [Updated] (ARROW-13150) [Python] combine_chunks fails on column of table, but does not error on table itself

     [ https://issues.apache.org/jira/browse/ARROW-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gert Hulselmans updated ARROW-13150:
------------------------------------
    Description: 
combine_chunks fails on column of table, but does not error on table itself (but creates 3 chunks instead).

Is there a reason why they are not handled the same?
{code:python}
In [90]: pa.__version__
Out[90]: '4.0.0'

# Get shape
In [85]: pa_table.shape
Out[85]: (102753589, 1)In [86]: pa_col1_array = pa_table.column(0)

# Get number of chunks
In [87]: pa_col1_array.num_chunks
Out[87]: 4404

# Combining chunks on the pyarrow table with one column works.
In [88]: pa_table.combine_chunks()
Out[88]: 
pyarrow.Table
# id=TEW__014e25__c14e1d__Multiome_RNA_brain_10x_no_perm: string

# Combining chunks on the column itself does not work.
In [89]: pa_col1_array.combine_chunks()
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-89-fdd0d0056a8e> in <module>
----> 1 pa_col1_array.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: offset overflow while concatenating arrays

# Assign combine chunks table to new tabled.
In [91]: pa_table_combined = pa_table.combine_chunks()

# Get first column
In [92]: pa_col1_array_from_pa_table_combined = pa_table_combined.column(0)

# Get number of chunks
In [93]: pa_col1_array_from_pa_table_combined.num_chunks
Out[93]: 3

# Try to combine column 1 again.
In [94]: pa_col1_array_from_pa_table_combined.combine_chunks()
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-94-e2e323e6519f> in <module>
----> 1 pa_col1_array_from_pa_table_combined.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: offset overflow while concatenating arrays

# Get sizes of each chunk.
In [106]: [chunk.nbytes for chunk in pa_col1_array_from_pa_table_combined.chunks]
Out[106]: [2341650593, 2342925682, 241257842]
{code}

  was:
combine_chunks fails on column of table, but does not error on table itself (but creates 3 chunks instead).

Is there a reason why they are not handled the same?
{code:java}
In [90]: pa.__version__
Out[90]: '4.0.0'

# Get shape
In [85]: pa_table.shape
Out[85]: (102753589, 1)In [86]: pa_col1_array = pa_table.column(0)

# Get number of chunks
In [87]: pa_col1_array.num_chunks
Out[87]: 4404

# Combining chunks on the pyarrow table with one column works.
In [88]: pa_table.combine_chunks()
Out[88]: 
pyarrow.Table
# id=TEW__014e25__c14e1d__Multiome_RNA_brain_10x_no_perm: string

# Combining chunks on the column itself does not work.
In [89]: pa_col1_array.combine_chunks()
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-89-fdd0d0056a8e> in <module>
----> 1 pa_col1_array.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: offset overflow while concatenating arrays

# Assign combine chunks table to new tabled.
In [91]: pa_table_combined = pa_table.combine_chunks()

# Get first column
In [92]: pa_col1_array_from_pa_table_combined = pa_table_combined.column(0)

# Get number of chunks
In [93]: pa_col1_array_from_pa_table_combined.num_chunks
Out[93]: 3

# Try to combine column 1 again.
In [94]: pa_col1_array_from_pa_table_combined.combine_chunks()
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-94-e2e323e6519f> in <module>
----> 1 pa_col1_array_from_pa_table_combined.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
/software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: offset overflow while concatenating arrays

# Get sizes of each chunk.
In [106]: [chunk.nbytes for chunk in pa_col1_array_from_pa_table_combined.chunks]
Out[106]: [2341650593, 2342925682, 241257842
{code}


> [Python] combine_chunks fails on column of table, but does not error on table itself
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-13150
>                 URL: https://issues.apache.org/jira/browse/ARROW-13150
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Gert Hulselmans
>            Priority: Minor
>
> combine_chunks fails on column of table, but does not error on table itself (but creates 3 chunks instead).
> Is there a reason why they are not handled the same?
> {code:python}
> In [90]: pa.__version__
> Out[90]: '4.0.0'
> # Get shape
> In [85]: pa_table.shape
> Out[85]: (102753589, 1)In [86]: pa_col1_array = pa_table.column(0)
> # Get number of chunks
> In [87]: pa_col1_array.num_chunks
> Out[87]: 4404
> # Combining chunks on the pyarrow table with one column works.
> In [88]: pa_table.combine_chunks()
> Out[88]: 
> pyarrow.Table
> # id=TEW__014e25__c14e1d__Multiome_RNA_brain_10x_no_perm: string
> # Combining chunks on the column itself does not work.
> In [89]: pa_col1_array.combine_chunks()
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-89-fdd0d0056a8e> in <module>
> ----> 1 pa_col1_array.combine_chunks()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: offset overflow while concatenating arrays
> # Assign combine chunks table to new tabled.
> In [91]: pa_table_combined = pa_table.combine_chunks()
> # Get first column
> In [92]: pa_col1_array_from_pa_table_combined = pa_table_combined.column(0)
> # Get number of chunks
> In [93]: pa_col1_array_from_pa_table_combined.num_chunks
> Out[93]: 3
> # Try to combine column 1 again.
> In [94]: pa_col1_array_from_pa_table_combined.combine_chunks()
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-94-e2e323e6519f> in <module>
> ----> 1 pa_col1_array_from_pa_table_combined.combine_chunks()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
> /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: offset overflow while concatenating arrays
> # Get sizes of each chunk.
> In [106]: [chunk.nbytes for chunk in pa_col1_array_from_pa_table_combined.chunks]
> Out[106]: [2341650593, 2342925682, 241257842]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)