You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2018/05/29 18:21:39 UTC

GroupBy in Spark / Scala without Agg functions

All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy
without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group,
m.student_d
ob from student as m inner join general_register g on m.student_id =
g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group,
m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return
value, how this kind of things could be done in Spark.

Thanks

Re: 答复: GroupBy in Spark / Scala without Agg functions

Posted by Chetan Khatri <ch...@gmail.com>.
I see, Thank you for explanation LInyuxin

On Wed, May 30, 2018 at 6:21 AM, Linyuxin <li...@huawei.com> wrote:

> Hi,
>
> Why not group by first then join?
>
> BTW, I don’t think there any difference between ‘distinct’ and ‘group by’
>
>
>
> Source code of 2.1:
>
> *def *distinct(): Dataset[T] = dropDuplicates()
>
> …
>
> def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
>
> …
>
> Aggregate(groupCols, aggCols, logicalPlan)
> }
>
>
>
>
>
>
>
>
>
> *发件人**:* Chetan Khatri [mailto:chetan.opensource@gmail.com]
> *发送时间:* 2018年5月30日 2:52
> *收件人:* Irving Duran <ir...@gmail.com>
> *抄送:* Georg Heiler <ge...@gmail.com>; user <
> user@spark.apache.org>
> *主题:* Re: GroupBy in Spark / Scala without Agg functions
>
>
>
> Georg, Sorry for dumb question. Help me to understand - if i do
> DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy
> without agg in sql right ?
>
>
>
> On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
> I don't want to get any aggregation, just want to know rather saying
> distinct to all columns any other better approach ?
>
>
>
> On Wed, May 30, 2018 at 12:16 AM, Irving Duran <ir...@gmail.com>
> wrote:
>
> Unless you want to get a count, yes.
>
>
> Thank You,
>
> Irving Duran
>
>
>
>
>
> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <ch...@gmail.com>
> wrote:
>
> Georg, I just want to double check that someone wrote MSSQL Server script
> where it's groupby all columns. What is alternate best way to do distinct
> all columns ?
>
>
>
>
>
>
>
> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <ge...@gmail.com>
> wrote:
>
> Why do you group if you do not want to aggregate?
>
> Isn't this the same as select distinct?
>
>
>
> Chetan Khatri <ch...@gmail.com> schrieb am Di., 29. Mai 2018
> um 20:21 Uhr:
>
> All,
>
>
>
> I have scenario like this in MSSQL Server SQL where i need to do groupBy
> without Agg function:
>
>
>
> Pseudocode:
>
>
>
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
>
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
>
> d group by m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_dob
>
>
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
>
>
> Thanks
>
>
>
>
>
>
>

答复: GroupBy in Spark / Scala without Agg functions

Posted by Linyuxin <li...@huawei.com>.
Hi,
Why not group by first then join?
BTW, I don’t think there any difference between ‘distinct’ and ‘group by’

Source code of 2.1:
def distinct(): Dataset[T] = dropDuplicates()
…
def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
…
Aggregate(groupCols, aggCols, logicalPlan)
}




发件人: Chetan Khatri [mailto:chetan.opensource@gmail.com]
发送时间: 2018年5月30日 2:52
收件人: Irving Duran <ir...@gmail.com>
抄送: Georg Heiler <ge...@gmail.com>; user <us...@spark.apache.org>
主题: Re: GroupBy in Spark / Scala without Agg functions

Georg, Sorry for dumb question. Help me to understand - if i do DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri <ch...@gmail.com>> wrote:
I don't want to get any aggregation, just want to know rather saying distinct to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran <ir...@gmail.com>> wrote:
Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <ch...@gmail.com>> wrote:
Georg, I just want to double check that someone wrote MSSQL Server script where it's groupby all columns. What is alternate best way to do distinct all columns ?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <ge...@gmail.com>> wrote:
Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri <ch...@gmail.com>> schrieb am Di., 29. Mai 2018 um 20:21 Uhr:
All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d
ob from student as m inner join general_register g on m.student_id = g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group, m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return value, how this kind of things could be done in Spark.

Thanks




Re: GroupBy in Spark / Scala without Agg functions

Posted by Chetan Khatri <ch...@gmail.com>.
Georg, Sorry for dumb question. Help me to understand - if i do
DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy without
agg in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri <chetan.opensource@gmail.com
> wrote:

> I don't want to get any aggregation, just want to know rather saying
> distinct to all columns any other better approach ?
>
> On Wed, May 30, 2018 at 12:16 AM, Irving Duran <ir...@gmail.com>
> wrote:
>
>> Unless you want to get a count, yes.
>>
>> Thank You,
>>
>> Irving Duran
>>
>>
>> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Georg, I just want to double check that someone wrote MSSQL Server
>>> script where it's groupby all columns. What is alternate best way to do
>>> distinct all columns ?
>>>
>>>
>>>
>>> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <
>>> georg.kf.heiler@gmail.com> wrote:
>>>
>>>> Why do you group if you do not want to aggregate?
>>>> Isn't this the same as select distinct?
>>>>
>>>> Chetan Khatri <ch...@gmail.com> schrieb am Di., 29. Mai
>>>> 2018 um 20:21 Uhr:
>>>>
>>>>> All,
>>>>>
>>>>> I have scenario like this in MSSQL Server SQL where i need to do
>>>>> groupBy without Agg function:
>>>>>
>>>>> Pseudocode:
>>>>>
>>>>>
>>>>> select m.student_id, m.student_name, m.student_std, m.student_group,
>>>>> m.student_d
>>>>> ob from student as m inner join general_register g on m.student_id =
>>>>> g.student_i
>>>>> d group by m.student_id, m.student_name, m.student_std,
>>>>> m.student_group, m.student_dob
>>>>>
>>>>> I tried to doing in spark but i am not able to get Dataframe as return
>>>>> value, how this kind of things could be done in Spark.
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>
>

Re: GroupBy in Spark / Scala without Agg functions

Posted by Chetan Khatri <ch...@gmail.com>.
I don't want to get any aggregation, just want to know rather saying
distinct to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran <ir...@gmail.com>
wrote:

> Unless you want to get a count, yes.
>
> Thank You,
>
> Irving Duran
>
>
> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <ch...@gmail.com>
> wrote:
>
>> Georg, I just want to double check that someone wrote MSSQL Server script
>> where it's groupby all columns. What is alternate best way to do distinct
>> all columns ?
>>
>>
>>
>> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <georg.kf.heiler@gmail.com
>> > wrote:
>>
>>> Why do you group if you do not want to aggregate?
>>> Isn't this the same as select distinct?
>>>
>>> Chetan Khatri <ch...@gmail.com> schrieb am Di., 29. Mai
>>> 2018 um 20:21 Uhr:
>>>
>>>> All,
>>>>
>>>> I have scenario like this in MSSQL Server SQL where i need to do
>>>> groupBy without Agg function:
>>>>
>>>> Pseudocode:
>>>>
>>>>
>>>> select m.student_id, m.student_name, m.student_std, m.student_group,
>>>> m.student_d
>>>> ob from student as m inner join general_register g on m.student_id =
>>>> g.student_i
>>>> d group by m.student_id, m.student_name, m.student_std,
>>>> m.student_group, m.student_dob
>>>>
>>>> I tried to doing in spark but i am not able to get Dataframe as return
>>>> value, how this kind of things could be done in Spark.
>>>>
>>>> Thanks
>>>>
>>>
>>

Re: GroupBy in Spark / Scala without Agg functions

Posted by Irving Duran <ir...@gmail.com>.
Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <ch...@gmail.com>
wrote:

> Georg, I just want to double check that someone wrote MSSQL Server script
> where it's groupby all columns. What is alternate best way to do distinct
> all columns ?
>
>
>
> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <ge...@gmail.com>
> wrote:
>
>> Why do you group if you do not want to aggregate?
>> Isn't this the same as select distinct?
>>
>> Chetan Khatri <ch...@gmail.com> schrieb am Di., 29. Mai 2018
>> um 20:21 Uhr:
>>
>>> All,
>>>
>>> I have scenario like this in MSSQL Server SQL where i need to do groupBy
>>> without Agg function:
>>>
>>> Pseudocode:
>>>
>>>
>>> select m.student_id, m.student_name, m.student_std, m.student_group,
>>> m.student_d
>>> ob from student as m inner join general_register g on m.student_id =
>>> g.student_i
>>> d group by m.student_id, m.student_name, m.student_std, m.student_group,
>>> m.student_dob
>>>
>>> I tried to doing in spark but i am not able to get Dataframe as return
>>> value, how this kind of things could be done in Spark.
>>>
>>> Thanks
>>>
>>
>

Re: GroupBy in Spark / Scala without Agg functions

Posted by Chetan Khatri <ch...@gmail.com>.
Georg, I just want to double check that someone wrote MSSQL Server script
where it's groupby all columns. What is alternate best way to do distinct
all columns ?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <ge...@gmail.com>
wrote:

> Why do you group if you do not want to aggregate?
> Isn't this the same as select distinct?
>
> Chetan Khatri <ch...@gmail.com> schrieb am Di., 29. Mai 2018
> um 20:21 Uhr:
>
>> All,
>>
>> I have scenario like this in MSSQL Server SQL where i need to do groupBy
>> without Agg function:
>>
>> Pseudocode:
>>
>>
>> select m.student_id, m.student_name, m.student_std, m.student_group,
>> m.student_d
>> ob from student as m inner join general_register g on m.student_id =
>> g.student_i
>> d group by m.student_id, m.student_name, m.student_std, m.student_group,
>> m.student_dob
>>
>> I tried to doing in spark but i am not able to get Dataframe as return
>> value, how this kind of things could be done in Spark.
>>
>> Thanks
>>
>

Re: GroupBy in Spark / Scala without Agg functions

Posted by Georg Heiler <ge...@gmail.com>.
Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri <ch...@gmail.com> schrieb am Di., 29. Mai 2018 um
20:21 Uhr:

> All,
>
> I have scenario like this in MSSQL Server SQL where i need to do groupBy
> without Agg function:
>
> Pseudocode:
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
> d group by m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_dob
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
> Thanks
>