You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chanh Le <gi...@gmail.com> on 2016/06/09 04:08:23 UTC

Spark Partition by Columns doesn't work properly

Hi everyone,
I tested the partition by columns of data frame but it’s not good I mean wrong.
I am using Spark 1.6.1 load data from Cassandra.
I repartition by 2 field date, network_id - 200 partitions
I reparation by 1 field date - 200 partitions.
but my data is data of 90 days -> I mean if we reparation by date it will be 90 partitions.
val daily = sql
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
  .load()
  .repartition(col("date"))


I mean It doesn’t change the way I put the columns to repartition.

Does anyone has the same problem? 

Thank in advance.

Re: Spark Partition by Columns doesn't work properly

Posted by Chanh Le <gi...@gmail.com>.
Ok, thanks.

On Thu, Jun 9, 2016, 12:51 PM Jasleen Kaur <ja...@gmail.com>
wrote:

> The github repo is https://github.com/datastax/spark-cassandra-connector
>
> The talk video and slides should be uploaded soon on spark summit website
>
>
> On Wednesday, June 8, 2016, Chanh Le <gi...@gmail.com> wrote:
>
>> Thanks, I'll look into it. Any luck to get link related to.
>>
>> On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur <ja...@gmail.com>
>> wrote:
>>
>>> Try using the datastax package. There was a great talk on spark summit
>>> about it. It will take care of the boiler plate code and you can focus on
>>> real business value
>>>
>>> On Wednesday, June 8, 2016, Chanh Le <gi...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>> I tested the partition by columns of data frame but it’s not good I
>>>> mean wrong.
>>>> I am using Spark 1.6.1 load data from Cassandra.
>>>> I repartition by 2 field date, network_id - 200 partitions
>>>> I reparation by 1 field date - 200 partitions.
>>>> but my data is data of 90 days -> I mean if we reparation by date it
>>>> will be 90 partitions.
>>>>
>>>> val daily = sql
>>>>   .read
>>>>   .format("org.apache.spark.sql.cassandra")
>>>>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>>>>   .load()
>>>>   .repartition(col("date"))
>>>>
>>>>
>>>>
>>>> I mean It doesn’t change the way I put the columns to repartition.
>>>>
>>>> Does anyone has the same problem?
>>>>
>>>> Thank in advance.
>>>>
>>>

Re: Spark Partition by Columns doesn't work properly

Posted by Jasleen Kaur <ja...@gmail.com>.
The github repo is https://github.com/datastax/spark-cassandra-connector

The talk video and slides should be uploaded soon on spark summit website

On Wednesday, June 8, 2016, Chanh Le <gi...@gmail.com> wrote:

> Thanks, I'll look into it. Any luck to get link related to.
>
> On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur <jasleenkaur1291@gmail.com
> <javascript:_e(%7B%7D,'cvml','jasleenkaur1291@gmail.com');>> wrote:
>
>> Try using the datastax package. There was a great talk on spark summit
>> about it. It will take care of the boiler plate code and you can focus on
>> real business value
>>
>> On Wednesday, June 8, 2016, Chanh Le <giaosudau@gmail.com
>> <javascript:_e(%7B%7D,'cvml','giaosudau@gmail.com');>> wrote:
>>
>>> Hi everyone,
>>> I tested the partition by columns of data frame but it’s not good I mean
>>> wrong.
>>> I am using Spark 1.6.1 load data from Cassandra.
>>> I repartition by 2 field date, network_id - 200 partitions
>>> I reparation by 1 field date - 200 partitions.
>>> but my data is data of 90 days -> I mean if we reparation by date it
>>> will be 90 partitions.
>>>
>>> val daily = sql
>>>   .read
>>>   .format("org.apache.spark.sql.cassandra")
>>>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>>>   .load()
>>>   .repartition(col("date"))
>>>
>>>
>>>
>>> I mean It doesn’t change the way I put the columns to repartition.
>>>
>>> Does anyone has the same problem?
>>>
>>> Thank in advance.
>>>
>>

Re: Spark Partition by Columns doesn't work properly

Posted by Chanh Le <gi...@gmail.com>.
Thanks, I'll look into it. Any luck to get link related to.

On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur <ja...@gmail.com>
wrote:

> Try using the datastax package. There was a great talk on spark summit
> about it. It will take care of the boiler plate code and you can focus on
> real business value
>
> On Wednesday, June 8, 2016, Chanh Le <gi...@gmail.com> wrote:
>
>> Hi everyone,
>> I tested the partition by columns of data frame but it’s not good I mean
>> wrong.
>> I am using Spark 1.6.1 load data from Cassandra.
>> I repartition by 2 field date, network_id - 200 partitions
>> I reparation by 1 field date - 200 partitions.
>> but my data is data of 90 days -> I mean if we reparation by date it will
>> be 90 partitions.
>>
>> val daily = sql
>>   .read
>>   .format("org.apache.spark.sql.cassandra")
>>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>>   .load()
>>   .repartition(col("date"))
>>
>>
>>
>> I mean It doesn’t change the way I put the columns to repartition.
>>
>> Does anyone has the same problem?
>>
>> Thank in advance.
>>
>

Re: Spark Partition by Columns doesn't work properly

Posted by Jasleen Kaur <ja...@gmail.com>.
Try using the datastax package. There was a great talk on spark summit
about it. It will take care of the boiler plate code and you can focus on
real business value

On Wednesday, June 8, 2016, Chanh Le <gi...@gmail.com> wrote:

> Hi everyone,
> I tested the partition by columns of data frame but it’s not good I mean
> wrong.
> I am using Spark 1.6.1 load data from Cassandra.
> I repartition by 2 field date, network_id - 200 partitions
> I reparation by 1 field date - 200 partitions.
> but my data is data of 90 days -> I mean if we reparation by date it will
> be 90 partitions.
>
> val daily = sql
>   .read
>   .format("org.apache.spark.sql.cassandra")
>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>   .load()
>   .repartition(col("date"))
>
>
>
> I mean It doesn’t change the way I put the columns to repartition.
>
> Does anyone has the same problem?
>
> Thank in advance.
>