You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2021/04/08 18:55:00 UTC

[jira] [Created] (BEAM-12132) DataFrame API: Consider allowing partitioning by column in addition to Index

Brian Hulette created BEAM-12132:
------------------------------------

             Summary: DataFrame API: Consider allowing partitioning by column in addition to Index
                 Key: BEAM-12132
                 URL: https://issues.apache.org/jira/browse/BEAM-12132
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py-core
            Reporter: Brian Hulette


For some DataFrame use-cases it may be beneficial to partition a dataset across the columns as well as across the index.

One example might be computing a correlation in a DataFrame with a very large number of columns. It would be beneficial to be able to perform pairwise column correlations on separate workers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)