You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nick Crews (Jira)" <ji...@apache.org> on 2022/06/17 00:13:00 UTC
[jira] [Commented] (ARROW-12099) [Python] Explode array column
[ https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555329#comment-17555329 ]
Nick Crews commented on ARROW-12099:
------------------------------------
Small tweak to Guido's implementation (thank you for this!): If the table only has the one ListArray or MapArray column, then it crashes.
This handles that case:
{code:python}
import pyarrow as paimport pyarrow.compute as pc
def explode_table(table, column): null_filled = pc.fill_null(table[column], [None]) flattened = pc.list_flatten(null_filled) other_columns = list(table.schema.names) other_columns.remove(column) if len(other_columns) == 0: return pa.table({column: flattened}) else: indices = pc.list_parent_indices(null_filled) result = table.select(other_columns).take(indices) result = result.append_column( pa.field(column, table.schema.field(column).type.value_type), flattened, ) return result {code}
> [Python] Explode array column
> -----------------------------
>
> Key: ARROW-12099
> URL: https://issues.apache.org/jira/browse/ARROW-12099
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Malthe Borch
> Priority: Major
>
> In Apache Spark, [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten] method to allow fully unnesting a [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].
--
This message was sent by Atlassian Jira
(v8.20.7#820007)