You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Michael Segel <ms...@hotmail.com> on 2016/05/17 19:48:10 UTC
Indexing of RDDs and DF in 2.0?
Hi,
I saw a replay of a talk about what’s coming in Spark 2.0 and the speed performances…
I am curious about indexing of data sets.
In HBase/MapRDB you can create ordered sets of indexes through an inverted table.
Here, you can take the intersection of the indexes to find the result set of rows.
(Or intersect/null if you have left outer joins…)
AFAIK, there was a project on an indexedRDD, but not sure how far that had gone?
I realize that some of the improvements are based on using hashed joins, which would make indexing a bit harder… or am I missing something?
Thx
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org