You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Rumph, Frens Jan" <ma...@frensjan.nl> on 2015/03/02 23:08:29 UTC

RDD partitions per executor in Cassandra Spark Connector

Hi all,

I didn't find the *issues* button on
https://github.com/datastax/spark-cassandra-connector/ so posting here.

Any one have an idea why token ranges are grouped into one partition per
executor? I expected at least one per core. Any suggestions on how to work
around this? Doing a repartition is way to expensive as I just want more
partitions for parallelism, not reshuffle ...

Thanks in advance!
Frens Jan

Re: RDD partitions per executor in Cassandra Spark Connector

Posted by Carl Yeksigian <ca...@yeksigian.com>.

These questions would be better addressed to the Spark Cassandra Connector
mailing list, which can be found here:
https://github.com/datastax/spark-cassandra-connector/#community

Thanks,
Carl

On Tue, Mar 3, 2015 at 4:42 AM, Pavel Velikhov <pa...@gmail.com>
wrote:

> Hi, is there a paper or a document where one can read how Spark reads
> Cassandra data in parallel? And how it writes data back from RDDs? Its a
> bit hard to have a clear picture in mind.
>
> Thank you,
> Pavel Velikhov
>
> On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <ma...@frensjan.nl> wrote:
>
> Hi all,
>
> I didn't find the *issues* button on
> https://github.com/datastax/spark-cassandra-connector/ so posting here.
>
> Any one have an idea why token ranges are grouped into one partition per
> executor? I expected at least one per core. Any suggestions on how to work
> around this? Doing a repartition is way to expensive as I just want more
> partitions for parallelism, not reshuffle ...
>
> Thanks in advance!
> Frens Jan
>
>
>

Re: RDD partitions per executor in Cassandra Spark Connector

Posted by Carl Yeksigian <ca...@yeksigian.com>.

These questions would be better addressed to the Spark Cassandra Connector
mailing list, which can be found here:
https://github.com/datastax/spark-cassandra-connector/#community

Thanks,
Carl

On Tue, Mar 3, 2015 at 4:42 AM, Pavel Velikhov <pa...@gmail.com>
wrote:

> Hi, is there a paper or a document where one can read how Spark reads
> Cassandra data in parallel? And how it writes data back from RDDs? Its a
> bit hard to have a clear picture in mind.
>
> Thank you,
> Pavel Velikhov
>
> On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <ma...@frensjan.nl> wrote:
>
> Hi all,
>
> I didn't find the *issues* button on
> https://github.com/datastax/spark-cassandra-connector/ so posting here.
>
> Any one have an idea why token ranges are grouped into one partition per
> executor? I expected at least one per core. Any suggestions on how to work
> around this? Doing a repartition is way to expensive as I just want more
> partitions for parallelism, not reshuffle ...
>
> Thanks in advance!
> Frens Jan
>
>
>

Re: RDD partitions per executor in Cassandra Spark Connector

Posted by Pavel Velikhov <pa...@gmail.com>.

Hi, is there a paper or a document where one can read how Spark reads Cassandra data in parallel? And how it writes data back from RDDs? Its a bit hard to have a clear picture in mind.

Thank you,
Pavel Velikhov

> On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <ma...@frensjan.nl> wrote:
> 
> Hi all,
> 
> I didn't find the issues button on https://github.com/datastax/spark-cassandra-connector/ <https://github.com/datastax/spark-cassandra-connector/> so posting here.
> 
> Any one have an idea why token ranges are grouped into one partition per executor? I expected at least one per core. Any suggestions on how to work around this? Doing a repartition is way to expensive as I just want more partitions for parallelism, not reshuffle ...
> 
> Thanks in advance!
> Frens Jan

Re: RDD partitions per executor in Cassandra Spark Connector

Posted by Pavel Velikhov <pa...@gmail.com>.

Hi, is there a paper or a document where one can read how Spark reads Cassandra data in parallel? And how it writes data back from RDDs? Its a bit hard to have a clear picture in mind.

Thank you,
Pavel Velikhov

> On Mar 3, 2015, at 1:08 AM, Rumph, Frens Jan <ma...@frensjan.nl> wrote:
> 
> Hi all,
> 
> I didn't find the issues button on https://github.com/datastax/spark-cassandra-connector/ <https://github.com/datastax/spark-cassandra-connector/> so posting here.
> 
> Any one have an idea why token ranges are grouped into one partition per executor? I expected at least one per core. Any suggestions on how to work around this? Doing a repartition is way to expensive as I just want more partitions for parallelism, not reshuffle ...
> 
> Thanks in advance!
> Frens Jan