You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Long, Andrew" <lo...@amazon.com.INVALID> on 2019/04/16 22:30:23 UTC

Sort order in bucketing in a custom datasource

Hey Friends,

Is it possible to specify the sort order or bucketing in a way that can be used by the optimizer in spark?

Cheers Andrew

Re: Sort order in bucketing in a custom datasource

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

I don't think so. I can't think of an interface (trait) that would give
that information to the Catalyst optimizer.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Tue, Apr 16, 2019 at 6:31 PM Long, Andrew <lo...@amazon.com.invalid>
wrote:

> Hey Friends,
>
>
>
> Is it possible to specify the sort order or bucketing in a way that can be
> used by the optimizer in spark?
>
>
>
> Cheers Andrew
>

Re: Sort order in bucketing in a custom datasource

Posted by Russell Spitzer <ru...@gmail.com>.
Please join the DataSource V2 meetings, the next one is tomorrow since we
are discussing these very topics right now. Datasource v1 cannot provide
this information but any source which
just generates RDDs can specify a partitioner. This is only useful though
if you are only using RDDs, for Dataframes DSV2 is the place to look.

https://calendar.google.com/event?action=TEMPLATE&tmeid=NzhmcGRka3JscjNiZWFkYnRwNnQ0ZzZlajcgcnVzc2VsbC5zcGl0emVyQG0&tmsrc=russell.spitzer%40gmail.com

On Tue, Apr 16, 2019 at 5:31 PM Long, Andrew <lo...@amazon.com.invalid>
wrote:

> Hey Friends,
>
>
>
> Is it possible to specify the sort order or bucketing in a way that can be
> used by the optimizer in spark?
>
>
>
> Cheers Andrew
>