You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Long, Andrew" <lo...@amazon.com.INVALID> on 2019/04/16 22:30:23 UTC
Sort order in bucketing in a custom datasource
Hey Friends,
Is it possible to specify the sort order or bucketing in a way that can be used by the optimizer in spark?
Cheers Andrew
Re: Sort order in bucketing in a custom datasource
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,
I don't think so. I can't think of an interface (trait) that would give
that information to the Catalyst optimizer.
Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski
On Tue, Apr 16, 2019 at 6:31 PM Long, Andrew <lo...@amazon.com.invalid>
wrote:
> Hey Friends,
>
>
>
> Is it possible to specify the sort order or bucketing in a way that can be
> used by the optimizer in spark?
>
>
>
> Cheers Andrew
>
Re: Sort order in bucketing in a custom datasource
Posted by Russell Spitzer <ru...@gmail.com>.
Please join the DataSource V2 meetings, the next one is tomorrow since we
are discussing these very topics right now. Datasource v1 cannot provide
this information but any source which
just generates RDDs can specify a partitioner. This is only useful though
if you are only using RDDs, for Dataframes DSV2 is the place to look.
https://calendar.google.com/event?action=TEMPLATE&tmeid=NzhmcGRka3JscjNiZWFkYnRwNnQ0ZzZlajcgcnVzc2VsbC5zcGl0emVyQG0&tmsrc=russell.spitzer%40gmail.com
On Tue, Apr 16, 2019 at 5:31 PM Long, Andrew <lo...@amazon.com.invalid>
wrote:
> Hey Friends,
>
>
>
> Is it possible to specify the sort order or bucketing in a way that can be
> used by the optimizer in spark?
>
>
>
> Cheers Andrew
>