You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2021/04/08 18:55:00 UTC
[jira] [Created] (BEAM-12132) DataFrame API: Consider allowing
partitioning by column in addition to Index
Brian Hulette created BEAM-12132:
------------------------------------
Summary: DataFrame API: Consider allowing partitioning by column in addition to Index
Key: BEAM-12132
URL: https://issues.apache.org/jira/browse/BEAM-12132
Project: Beam
Issue Type: Improvement
Components: sdk-py-core
Reporter: Brian Hulette
For some DataFrame use-cases it may be beneficial to partition a dataset across the columns as well as across the index.
One example might be computing a correlation in a DataFrame with a very large number of columns. It would be beneficial to be able to perform pairwise column correlations on separate workers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)