You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Alexey Romanenko (Jira)" <ji...@apache.org> on 2021/03/16 14:04:00 UTC

[jira] [Updated] (BEAM-11881) DataFrame subpartitioning order is incorrect

     [ https://issues.apache.org/jira/browse/BEAM-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Romanenko updated BEAM-11881:
------------------------------------
    Status: Open  (was: Triage Needed)

> DataFrame subpartitioning order is incorrect
> --------------------------------------------
>
>                 Key: BEAM-11881
>                 URL: https://issues.apache.org/jira/browse/BEAM-11881
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Priority: P2
>              Labels: dataframe-api
>          Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently we've defined
> Nothing() < Index([i]) < Index([i,j]) < .. < Index() < Singleton()
> s.t. Singleton is a subpartitoning of Index, is a subpartitioning of Index([i,j]), but this is incorrect. The order should be 
> Singleton() < Index([i]) < Index([i,j]) < .. < Index() < Nothing()
> s.t. every other partitioning is a subpartitioning of Singleton. This is logical, since Singleton will collect the largest amount of data on a single node, partitioning by a single index will be alittle more distributed, and partitioning by the full Index() will be the most distribtued.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)