You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Sandeep Khurana <sa...@infoworks.io> on 2016/09/19 20:33:37 UTC

group by on varchar column

Hello

I have a query where i do group by on a varchar column. The column values
are long sentences (not just single words). This column is part of a
dimension table.

When i select just from dimension table with this group by then I do
get ~2000 records   .

But when I join this dimension with the fact table and run the group by
query then I get just 1 record as Kylin somehow assumes the VARCHAR column
values as NULL. There is not even a single row which has value of this
VARCHAR field as null.

Same query I copy paste and run on the Hive tables, I do get more than
thousand records.

Strange thing is when I change the aggregate column to another VARCHAR
column (city_name) whose values are just one word and run on kylin SQL
editor then I do get proper records .

2 questions

- Any idea why such behaviour ? Especially when Hive gives proper records
whereas kylin returns just one record which has value of this big varchar
field as NULL.

- Is there any work around?

Re: group by on varchar column

Posted by Sandeep Khurana <sa...@infoworks.io>.
Thanks.

On 28-Sep-2016 5:35 am, "Li Yang" <li...@apache.org> wrote:

> Will work on it soon. https://issues.apache.org/jira/browse/KYLIN-1971
>
> On Wed, Sep 21, 2016 at 9:36 PM, Billy(Yiming) Liu <
> liuyiming.vip@gmail.com> wrote:
>
>> So, you hit this bug, https://issues.apache.org/jira/browse/KYLIN-1971
>>
>> 2016-09-21 19:41 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>>
>>> Found the issue, when fact and dimension have same column names on which
>>> group by is being done then this problem occurs. I removed similar named
>>> column from the fact table.
>>>
>>> On Tue, Sep 20, 2016 at 2:13 AM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> btw, we are using Kylin version 1.5.2
>>>>
>>>> On Tue, Sep 20, 2016 at 2:03 AM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>>> Hello
>>>>>
>>>>> I have a query where i do group by on a varchar column. The column
>>>>> values are long sentences (not just single words). This column is part of a
>>>>> dimension table.
>>>>>
>>>>> When i select just from dimension table with this group by then I do
>>>>> get ~2000 records   .
>>>>>
>>>>> But when I join this dimension with the fact table and run the group
>>>>> by query then I get just 1 record as Kylin somehow assumes the VARCHAR
>>>>> column values as NULL. There is not even a single row which has value of
>>>>> this VARCHAR field as null.
>>>>>
>>>>> Same query I copy paste and run on the Hive tables, I do get more than
>>>>> thousand records.
>>>>>
>>>>> Strange thing is when I change the aggregate column to another VARCHAR
>>>>> column (city_name) whose values are just one word and run on kylin SQL
>>>>> editor then I do get proper records .
>>>>>
>>>>> 2 questions
>>>>>
>>>>> - Any idea why such behaviour ? Especially when Hive gives proper
>>>>> records whereas kylin returns just one record which has value of this big
>>>>> varchar field as NULL.
>>>>>
>>>>> - Is there any work around?
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>

Re: group by on varchar column

Posted by Li Yang <li...@apache.org>.
Will work on it soon. https://issues.apache.org/jira/browse/KYLIN-1971

On Wed, Sep 21, 2016 at 9:36 PM, Billy(Yiming) Liu <li...@gmail.com>
wrote:

> So, you hit this bug, https://issues.apache.org/jira/browse/KYLIN-1971
>
> 2016-09-21 19:41 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>
>> Found the issue, when fact and dimension have same column names on which
>> group by is being done then this problem occurs. I removed similar named
>> column from the fact table.
>>
>> On Tue, Sep 20, 2016 at 2:13 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> btw, we are using Kylin version 1.5.2
>>>
>>> On Tue, Sep 20, 2016 at 2:03 AM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Hello
>>>>
>>>> I have a query where i do group by on a varchar column. The column
>>>> values are long sentences (not just single words). This column is part of a
>>>> dimension table.
>>>>
>>>> When i select just from dimension table with this group by then I do
>>>> get ~2000 records   .
>>>>
>>>> But when I join this dimension with the fact table and run the group by
>>>> query then I get just 1 record as Kylin somehow assumes the VARCHAR column
>>>> values as NULL. There is not even a single row which has value of this
>>>> VARCHAR field as null.
>>>>
>>>> Same query I copy paste and run on the Hive tables, I do get more than
>>>> thousand records.
>>>>
>>>> Strange thing is when I change the aggregate column to another VARCHAR
>>>> column (city_name) whose values are just one word and run on kylin SQL
>>>> editor then I do get proper records .
>>>>
>>>> 2 questions
>>>>
>>>> - Any idea why such behaviour ? Especially when Hive gives proper
>>>> records whereas kylin returns just one record which has value of this big
>>>> varchar field as NULL.
>>>>
>>>> - Is there any work around?
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>

Re: group by on varchar column

Posted by "Billy(Yiming) Liu" <li...@gmail.com>.
So, you hit this bug, https://issues.apache.org/jira/browse/KYLIN-1971

2016-09-21 19:41 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:

> Found the issue, when fact and dimension have same column names on which
> group by is being done then this problem occurs. I removed similar named
> column from the fact table.
>
> On Tue, Sep 20, 2016 at 2:13 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> btw, we are using Kylin version 1.5.2
>>
>> On Tue, Sep 20, 2016 at 2:03 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Hello
>>>
>>> I have a query where i do group by on a varchar column. The column
>>> values are long sentences (not just single words). This column is part of a
>>> dimension table.
>>>
>>> When i select just from dimension table with this group by then I do
>>> get ~2000 records   .
>>>
>>> But when I join this dimension with the fact table and run the group by
>>> query then I get just 1 record as Kylin somehow assumes the VARCHAR column
>>> values as NULL. There is not even a single row which has value of this
>>> VARCHAR field as null.
>>>
>>> Same query I copy paste and run on the Hive tables, I do get more than
>>> thousand records.
>>>
>>> Strange thing is when I change the aggregate column to another VARCHAR
>>> column (city_name) whose values are just one word and run on kylin SQL
>>> editor then I do get proper records .
>>>
>>> 2 questions
>>>
>>> - Any idea why such behaviour ? Especially when Hive gives proper
>>> records whereas kylin returns just one record which has value of this big
>>> varchar field as NULL.
>>>
>>> - Is there any work around?
>>>
>>>
>>
>>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
With Warm regards

Yiming Liu (刘一鸣)

Re: group by on varchar column

Posted by Sandeep Khurana <sa...@infoworks.io>.
Found the issue, when fact and dimension have same column names on which
group by is being done then this problem occurs. I removed similar named
column from the fact table.

On Tue, Sep 20, 2016 at 2:13 AM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> btw, we are using Kylin version 1.5.2
>
> On Tue, Sep 20, 2016 at 2:03 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Hello
>>
>> I have a query where i do group by on a varchar column. The column values
>> are long sentences (not just single words). This column is part of a
>> dimension table.
>>
>> When i select just from dimension table with this group by then I do
>> get ~2000 records   .
>>
>> But when I join this dimension with the fact table and run the group by
>> query then I get just 1 record as Kylin somehow assumes the VARCHAR column
>> values as NULL. There is not even a single row which has value of this
>> VARCHAR field as null.
>>
>> Same query I copy paste and run on the Hive tables, I do get more than
>> thousand records.
>>
>> Strange thing is when I change the aggregate column to another VARCHAR
>> column (city_name) whose values are just one word and run on kylin SQL
>> editor then I do get proper records .
>>
>> 2 questions
>>
>> - Any idea why such behaviour ? Especially when Hive gives proper records
>> whereas kylin returns just one record which has value of this big varchar
>> field as NULL.
>>
>> - Is there any work around?
>>
>>
>
>


-- 
Architect
Infoworks.io
http://Infoworks.io

Re: group by on varchar column

Posted by Sandeep Khurana <sa...@infoworks.io>.
btw, we are using Kylin version 1.5.2

On Tue, Sep 20, 2016 at 2:03 AM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> Hello
>
> I have a query where i do group by on a varchar column. The column values
> are long sentences (not just single words). This column is part of a
> dimension table.
>
> When i select just from dimension table with this group by then I do
> get ~2000 records   .
>
> But when I join this dimension with the fact table and run the group by
> query then I get just 1 record as Kylin somehow assumes the VARCHAR column
> values as NULL. There is not even a single row which has value of this
> VARCHAR field as null.
>
> Same query I copy paste and run on the Hive tables, I do get more than
> thousand records.
>
> Strange thing is when I change the aggregate column to another VARCHAR
> column (city_name) whose values are just one word and run on kylin SQL
> editor then I do get proper records .
>
> 2 questions
>
> - Any idea why such behaviour ? Especially when Hive gives proper records
> whereas kylin returns just one record which has value of this big varchar
> field as NULL.
>
> - Is there any work around?
>
>