You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yana Kadiyska <ya...@gmail.com> on 2015/07/09 17:00:04 UTC
[SparkSQL] Incorrect ROLLUP results
Hi folks, I just re-wrote a query from using UNION ALL to use "with rollup"
and I'm seeing some unexpected behavior. I'll open a JIRA if needed but
wanted to check if this is user error. Here is my code:
case class KeyValue(key: Int, value: String)
val df = sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
df.registerTempTable("foo")
sqlContext.sql(“select count(*) as cnt, value as key,GROUPING__ID from
foo group by value with rollup”).show(100)
sqlContext.sql(“select count(*) as cnt, key % 100 as key,GROUPING__ID
from foo group by key%100 with rollup”).show(100)
Grouping by value does the right thing, I get one group 0 with the overall
count. But grouping by expression (key%100) produces weird results --
appears that group 1 results are replicated as group 0. Am I doing
something wrong or is this a bug?
RE: [SparkSQL] Incorrect ROLLUP results
Posted by "Cheng, Hao" <ha...@intel.com>.
Never mind, I’ve created the jira issue at https://issues.apache.org/jira/browse/SPARK-8972.
From: Cheng, Hao [mailto:hao.cheng@intel.com]
Sent: Friday, July 10, 2015 9:15 AM
To: yana.kadiyska@gmail.com; ayan guha
Cc: user
Subject: RE: [SparkSQL] Incorrect ROLLUP results
Yes, this is a bug, do you mind to create a jira issue for this? I will fix this asap.
BTW, what’s your spark version?
From: Yana Kadiyska [mailto:yana.kadiyska@gmail.com]
Sent: Friday, July 10, 2015 12:16 AM
To: ayan guha
Cc: user
Subject: Re: [SparkSQL] Incorrect ROLLUP results
+---+---+---+
|cnt|_c1|grp|
+---+---+---+
| 1| 31| 0|
| 1| 31| 1|
| 1| 4| 0|
| 1| 4| 1|
| 1| 42| 0|
| 1| 42| 1|
| 1| 15| 0|
| 1| 15| 1|
| 1| 26| 0|
| 1| 26| 1|
| 1| 37| 0|
| 1| 10| 0|
| 1| 37| 1|
| 1| 10| 1|
| 1| 48| 0|
| 1| 21| 0|
| 1| 48| 1|
| 1| 21| 1|
| 1| 32| 0|
| 1| 32| 1|
+---+---+---+
On Thu, Jul 9, 2015 at 11:54 AM, ayan guha <gu...@gmail.com>> wrote:
Can you please post result of show()?
On 10 Jul 2015 01:00, "Yana Kadiyska" <ya...@gmail.com>> wrote:
Hi folks, I just re-wrote a query from using UNION ALL to use "with rollup" and I'm seeing some unexpected behavior. I'll open a JIRA if needed but wanted to check if this is user error. Here is my code:
case class KeyValue(key: Int, value: String)
val df = sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
df.registerTempTable("foo")
sqlContext.sql(“select count(*) as cnt, value as key,GROUPING__ID from foo group by value with rollup”).show(100)
sqlContext.sql(“select count(*) as cnt, key % 100 as key,GROUPING__ID from foo group by key%100 with rollup”).show(100)
Grouping by value does the right thing, I get one group 0 with the overall count. But grouping by expression (key%100) produces weird results -- appears that group 1 results are replicated as group 0. Am I doing something wrong or is this a bug?
RE: [SparkSQL] Incorrect ROLLUP results
Posted by "Cheng, Hao" <ha...@intel.com>.
Yes, this is a bug, do you mind to create a jira issue for this? I will fix this asap.
BTW, what’s your spark version?
From: Yana Kadiyska [mailto:yana.kadiyska@gmail.com]
Sent: Friday, July 10, 2015 12:16 AM
To: ayan guha
Cc: user
Subject: Re: [SparkSQL] Incorrect ROLLUP results
+---+---+---+
|cnt|_c1|grp|
+---+---+---+
| 1| 31| 0|
| 1| 31| 1|
| 1| 4| 0|
| 1| 4| 1|
| 1| 42| 0|
| 1| 42| 1|
| 1| 15| 0|
| 1| 15| 1|
| 1| 26| 0|
| 1| 26| 1|
| 1| 37| 0|
| 1| 10| 0|
| 1| 37| 1|
| 1| 10| 1|
| 1| 48| 0|
| 1| 21| 0|
| 1| 48| 1|
| 1| 21| 1|
| 1| 32| 0|
| 1| 32| 1|
+---+---+---+
On Thu, Jul 9, 2015 at 11:54 AM, ayan guha <gu...@gmail.com>> wrote:
Can you please post result of show()?
On 10 Jul 2015 01:00, "Yana Kadiyska" <ya...@gmail.com>> wrote:
Hi folks, I just re-wrote a query from using UNION ALL to use "with rollup" and I'm seeing some unexpected behavior. I'll open a JIRA if needed but wanted to check if this is user error. Here is my code:
case class KeyValue(key: Int, value: String)
val df = sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
df.registerTempTable("foo")
sqlContext.sql(“select count(*) as cnt, value as key,GROUPING__ID from foo group by value with rollup”).show(100)
sqlContext.sql(“select count(*) as cnt, key % 100 as key,GROUPING__ID from foo group by key%100 with rollup”).show(100)
Grouping by value does the right thing, I get one group 0 with the overall count. But grouping by expression (key%100) produces weird results -- appears that group 1 results are replicated as group 0. Am I doing something wrong or is this a bug?
Re: [SparkSQL] Incorrect ROLLUP results
Posted by Yana Kadiyska <ya...@gmail.com>.
+---+---+---+
|cnt|_c1|grp|
+---+---+---+
| 1| 31| 0|
| 1| 31| 1|
| 1| 4| 0|
| 1| 4| 1|
| 1| 42| 0|
| 1| 42| 1|
| 1| 15| 0|
| 1| 15| 1|
| 1| 26| 0|
| 1| 26| 1|
| 1| 37| 0|
| 1| 10| 0|
| 1| 37| 1|
| 1| 10| 1|
| 1| 48| 0|
| 1| 21| 0|
| 1| 48| 1|
| 1| 21| 1|
| 1| 32| 0|
| 1| 32| 1|
+---+---+---+
On Thu, Jul 9, 2015 at 11:54 AM, ayan guha <gu...@gmail.com> wrote:
> Can you please post result of show()?
> On 10 Jul 2015 01:00, "Yana Kadiyska" <ya...@gmail.com> wrote:
>
>> Hi folks, I just re-wrote a query from using UNION ALL to use "with
>> rollup" and I'm seeing some unexpected behavior. I'll open a JIRA if needed
>> but wanted to check if this is user error. Here is my code:
>>
>> case class KeyValue(key: Int, value: String)
>> val df = sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
>>
>> df.registerTempTable("foo")
>>
>> sqlContext.sql(“select count(*) as cnt, value as key,GROUPING__ID from foo group by value with rollup”).show(100)
>>
>>
>> sqlContext.sql(“select count(*) as cnt, key % 100 as key,GROUPING__ID from foo group by key%100 with rollup”).show(100)
>>
>>
>>
>> Grouping by value does the right thing, I get one group 0 with the
>> overall count. But grouping by expression (key%100) produces weird results
>> -- appears that group 1 results are replicated as group 0. Am I doing
>> something wrong or is this a bug?
>>
>
Re: [SparkSQL] Incorrect ROLLUP results
Posted by ayan guha <gu...@gmail.com>.
Can you please post result of show()?
On 10 Jul 2015 01:00, "Yana Kadiyska" <ya...@gmail.com> wrote:
> Hi folks, I just re-wrote a query from using UNION ALL to use "with
> rollup" and I'm seeing some unexpected behavior. I'll open a JIRA if needed
> but wanted to check if this is user error. Here is my code:
>
> case class KeyValue(key: Int, value: String)
> val df = sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
>
> df.registerTempTable("foo")
>
> sqlContext.sql(“select count(*) as cnt, value as key,GROUPING__ID from foo group by value with rollup”).show(100)
>
>
> sqlContext.sql(“select count(*) as cnt, key % 100 as key,GROUPING__ID from foo group by key%100 with rollup”).show(100)
>
>
>
> Grouping by value does the right thing, I get one group 0 with the overall
> count. But grouping by expression (key%100) produces weird results --
> appears that group 1 results are replicated as group 0. Am I doing
> something wrong or is this a bug?
>