You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nick Crews (Jira)" <ji...@apache.org> on 2022/06/17 00:13:00 UTC

[jira] [Commented] (ARROW-12099) [Python] Explode array column

    [ https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555329#comment-17555329 ] 

Nick Crews commented on ARROW-12099:
------------------------------------

Small tweak to Guido's implementation (thank you for this!): If the table only has the one ListArray or MapArray column, then it crashes.

This handles that case:
{code:python}
import pyarrow as paimport pyarrow.compute as pc
def explode_table(table, column):    null_filled = pc.fill_null(table[column], [None])    flattened = pc.list_flatten(null_filled)    other_columns = list(table.schema.names)    other_columns.remove(column)    if len(other_columns) == 0:        return pa.table({column: flattened})    else:        indices = pc.list_parent_indices(null_filled)        result = table.select(other_columns).take(indices)        result = result.append_column(            pa.field(column, table.schema.field(column).type.value_type),            flattened,        )        return result {code}

> [Python] Explode array column
> -----------------------------
>
>                 Key: ARROW-12099
>                 URL: https://issues.apache.org/jira/browse/ARROW-12099
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Malthe Borch
>            Priority: Major
>
> In Apache Spark, [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten] method to allow fully unnesting a [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)