You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Anton Okolnychyi <ao...@gmail.com> on 2020/03/06 23:12:38 UTC

[DISCUSS][SPARK-23889] DataSourceV2: required sorting and clustering for writes

Hi devs,


I want to follow up on the dev list discussion [1]
<https://lists.apache.org/thread.html/d8bb72fc9b4be8acc3f49367bfc99cbf029194a58333eba69df49717@%3Cdev.spark.apache.org%3E>
and the JIRA issue [2] <https://jira.apache.org/jira/browse/SPARK-23889>
created as a result of it and propose a slightly different approach to
allow data sources to request a specific distribution and ordering of data
on write.


I've put a short document [3]
<https://docs.google.com/document/d/1X0NsQSryvNmXBY9kcvfINeYyKC-AahZarUqg3nS1GQs/>
describing the proposed approach. It would be great to hear what the
community thinks.


The SQL part of the proposal requires further discussion and any ideas are
more than welcome.


Thanks,

Anton