You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Vadim Semenov <_...@databuryat.com> on 2015/09/22 06:34:29 UTC

Inconsistent query results

Hi,

We've found issues while running some queries:
they return inconsistent results, i.e. in some cases we don't get any rows, in some cases we get some rows but never all that we expected to get.

I was able to pin-point the queries, so here're the cases:

1. SELECT dim, measure FROM table WHERE partition = one partition AND dim IN (a,b,c) GROUP BY dim
We get results for a,c only
http://i.imgur.com/SZu6f2E.png

2. IN (b)
We get results for b as expected
http://i.imgur.com/8c8UMWj.png

3. IN (a,b)
We don't get any results
http://i.imgur.com/qIepe8d.png

4. IN (b,c)
We get results for b,c as expected
http://i.imgur.com/Qq6yuuS.png

We tried to run the queries with acceptPartial=false and with empty cache and the issues are still the same.

What should we do to debug this?
What might be causing these issues?

Re: Inconsistent query results

Posted by hongbin ma <ma...@apache.org>.
This is a bug: https://issues.apache.org/jira/browse/KYLIN-1053
As a workaround now you can use dictionary for all non-string dimensions

thanks for your observations and report, this is really helpful


On Tue, Sep 29, 2015 at 5:49 PM, Vadim Semenov <_...@databuryat.com> wrote:

> integer,
>
> and the cube was built without using dictionaries.
>
> On September 29, 2015 at 5:45:49 AM, hongbin ma (mahongbin@apache.org)
> wrote:
>
> interesting observation
>
> what it the data type of the dimension?
>
> On Tue, Sep 29, 2015 at 5:40 PM, Vadim Semenov <_...@databuryat.com> wrote:
>
> > I found something:
> >
> > I executed a query on one partition with a filter on one dimension with
> > two values (70, 200) and got the following in the logs:
> > http://i.imgur.com/b3zthZ3.png
> >
> > You see the scan range and 0 rows in the result set.
> >
> > I tried the same scan in hbase shell and got 0 rows:
> > scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01
> > 2015-09-2670\x09\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01
> > 2015-09-26200\x09\x09\x09\x00"}
> >
> > then I swapped the start & stop row:
> > scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01
> > 2015-09-26200\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01
> > 2015-09-2670\x09\x09\x09\x09\x00"}
> > and got 659 rows.
> >
> > And just to confirm I changed the order of the stop & start row in
> > HBaseKeyRange and got the results:
> > http://i.imgur.com/mT0qI4I.png
> >
> > On September 28, 2015 at 11:19:41 PM, Li Yang (liyang@apache.org) wrote:
> > I too cannot reproduce this one. Tried IN() on the LSTG_SITE_ID column of
> > TEST_KYLIN_FACT (the test cube used by regression). Everything is good.
> > The query I used:
> >
> > select LSTG_SITE_ID, sum(price) as GMV
> > from test_kylin_fact
> > inner JOIN edw.test_cal_dt as test_cal_dt
> > ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt
> > where test_cal_dt.week_beg_dt between DATE '2013-09-01' and DATE
> > '2013-10-01' and LSTG_SITE_ID in (0, 3, 15, 23, 100)
> > group by LSTG_SITE_ID
> >
> > Meant to be similar to Vadim's query, has a date condition, and the
> > LSTG_SITE_ID is of int type. Tested many combination of the ID values,
> all
> > results are correct...
> >
> > Anyone else tried similar queries on 0.7 or 1.x releases?
> >
> >
> >
> > On Thu, Sep 24, 2015 at 1:28 PM, hongbin ma <ma...@apache.org>
> wrote:
> >
> > > if you could give some sample data only and make sure the issue can be
> > > reproduceable on the sample data
> > >
> > > On Thu, Sep 24, 2015 at 12:28 PM, vipul jhawar <vipul.jhawar@gmail.com
> >
> > > wrote:
> > >
> > > > hi
> > > >
> > > > For the cube
> > > >
> > > > JSON model is
> > > > http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e
> > > > JSON cube is
> > > > http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d
> > > >
> > > > On minimal data for the cube, even if we generate it how will we
> share
> > it
> > > > with you ? or you want sample fact table.
> > > > I can show you the kylin query in play on our current cube which is
> > > hosted
> > > > if you want.
> > > >
> > > > Thanks
> > > >
> > > > On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We cannot reproduce it with our test cases.
> > > > >
> > > > > However, we'd love to help to analyze the problem for you. If
> > possible,
> > > > can
> > > > > you please try to use a minimal cube definition(maybe with a little
> > > > sample
> > > > > data) that will pin-point the issue?
> > > > >
> > > > > The cube desc consist of three parts:
> > > > > 1. Cube Desc (json file)
> > > > > 2. Model Desc (json file)
> > > > > 3. Hive table schema
> > > > >
> > > > > the first two files can be checked directly in kylin web, go to
> > "Cubes"
> > > > > tab, click the cube, and checkout contents in "Json(Cube)" and
> > > > > "Json(Model)"
> > > > >
> > > > >
> > > > > On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org>
> > > > wrote:
> > > > >
> > > > > > hi vipul
> > > > > >
> > > > > > I'm looking into this
> > > > > >
> > > > > > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <
> > > vipul.jhawar@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> hi
> > > > > >>
> > > > > >> Just wanted to check if someone has had a chance to look at this
> > > case.
> > > > > >>
> > > > > >> Thanks
> > > > > >>
> > > > > >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <
> > > > vipul.jhawar@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi
> > > > > >> >
> > > > > >> > Please let us know if you need more details on this as it is
> > > > affecting
> > > > > >> the
> > > > > >> > results and its not predictable.
> > > > > >> > We are on 0.7.2
> > > > > >> >
> > > > > >> > Thanks
> > > > > >> >
> > > > > >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_@
> > databuryat.com
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> >> Hi,
> > > > > >> >>
> > > > > >> >> We've found issues while running some queries:
> > > > > >> >> they return inconsistent results, i.e. in some cases we don't
> > get
> > > > any
> > > > > >> >> rows, in some cases we get some rows but never all that we
> > > expected
> > > > > to
> > > > > >> get.
> > > > > >> >>
> > > > > >> >> I was able to pin-point the queries, so here're the cases:
> > > > > >> >>
> > > > > >> >> 1. SELECT dim, measure FROM table WHERE partition = one
> > partition
> > > > AND
> > > > > >> dim
> > > > > >> >> IN (a,b,c) GROUP BY dim
> > > > > >> >> We get results for a,c only
> > > > > >> >> http://i.imgur.com/SZu6f2E.png
> > > > > >> >>
> > > > > >> >> 2. IN (b)
> > > > > >> >> We get results for b as expected
> > > > > >> >> http://i.imgur.com/8c8UMWj.png
> > > > > >> >>
> > > > > >> >> 3. IN (a,b)
> > > > > >> >> We don't get any results
> > > > > >> >> http://i.imgur.com/qIepe8d.png
> > > > > >> >>
> > > > > >> >> 4. IN (b,c)
> > > > > >> >> We get results for b,c as expected
> > > > > >> >> http://i.imgur.com/Qq6yuuS.png
> > > > > >> >>
> > > > > >> >> We tried to run the queries with acceptPartial=false and with
> > > empty
> > > > > >> cache
> > > > > >> >> and the issues are still the same.
> > > > > >> >>
> > > > > >> >> What should we do to debug this?
> > > > > >> >> What might be causing these issues?
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > *Bin Mahone | 马洪宾*
> > > > > > Apache Kylin: http://kylin.io
> > > > > > Github: https://github.com/binmahone
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > *Bin Mahone | 马洪宾*
> > > > > Apache Kylin: http://kylin.io
> > > > > Github: https://github.com/binmahone
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Inconsistent query results

Posted by "Adunuthula, Seshu" <sa...@ebay.com>.
Vadim,

This is a great catch. Thanks for helping make Kylin more stable.

Regards
Seshu

On 9/29/15, 11:49 AM, "Vadim Semenov" <_...@databuryat.com> wrote:

>integer,
>
>and the cube was built without using dictionaries.
>
>On September 29, 2015 at 5:45:49 AM, hongbin ma (mahongbin@apache.org)
>wrote:
>
>interesting observation
>
>what it the data type of the dimension?
>
>On Tue, Sep 29, 2015 at 5:40 PM, Vadim Semenov <_...@databuryat.com> wrote:
>
>> I found something:
>>  
>> I executed a query on one partition with a filter on one dimension with
>> 
>> two values (70, 200) and got the following in the logs:
>> http://i.imgur.com/b3zthZ3.png
>>  
>> You see the scan range and 0 rows in the result set.
>>  
>> I tried the same scan in hbase shell and got 0 rows:
>> scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01
>> 2015-09-2670\x09\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01
>> 
>> 2015-09-26200\x09\x09\x09\x00"}
>>  
>> then I swapped the start & stop row:
>> scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01
>> 2015-09-26200\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01
>> 2015-09-2670\x09\x09\x09\x09\x00"}
>> and got 659 rows.
>>  
>> And just to confirm I changed the order of the stop & start row in
>> HBaseKeyRange and got the results:
>> http://i.imgur.com/mT0qI4I.png
>>  
>> On September 28, 2015 at 11:19:41 PM, Li Yang (liyang@apache.org)
>>wrote:  
>> I too cannot reproduce this one. Tried IN() on the LSTG_SITE_ID column
>>of  
>> TEST_KYLIN_FACT (the test cube used by regression). Everything is good.
>> 
>> The query I used:
>>  
>> select LSTG_SITE_ID, sum(price) as GMV
>> from test_kylin_fact
>> inner JOIN edw.test_cal_dt as test_cal_dt
>> ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt
>> where test_cal_dt.week_beg_dt between DATE '2013-09-01' and DATE
>> '2013-10-01' and LSTG_SITE_ID in (0, 3, 15, 23, 100)
>> group by LSTG_SITE_ID
>>  
>> Meant to be similar to Vadim's query, has a date condition, and the
>> LSTG_SITE_ID is of int type. Tested many combination of the ID values,
>>all  
>> results are correct...
>>  
>> Anyone else tried similar queries on 0.7 or 1.x releases?
>>  
>>  
>>  
>> On Thu, Sep 24, 2015 at 1:28 PM, hongbin ma <ma...@apache.org>
>>wrote:  
>>  
>> > if you could give some sample data only and make sure the issue can
>>be  
>> > reproduceable on the sample data
>> >  
>> > On Thu, Sep 24, 2015 at 12:28 PM, vipul jhawar
>><vi...@gmail.com>
>> > wrote:  
>> >  
>> > > hi  
>> > >  
>> > > For the cube
>> > >  
>> > > JSON model is
>> > > 
>>http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e
>> > > JSON cube is
>> > > 
>>http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d
>> > >  
>> > > On minimal data for the cube, even if we generate it how will we
>>share  
>> it  
>> > > with you ? or you want sample fact table.
>> > > I can show you the kylin query in play on our current cube which is
>> 
>> > hosted  
>> > > if you want.
>> > >  
>> > > Thanks  
>> > >  
>> > > On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org>
>> > wrote:  
>> > >  
>> > > > Hi,  
>> > > >  
>> > > > We cannot reproduce it with our test cases.
>> > > >  
>> > > > However, we'd love to help to analyze the problem for you. If
>> possible,  
>> > > can  
>> > > > you please try to use a minimal cube definition(maybe with a
>>little  
>> > > sample  
>> > > > data) that will pin-point the issue?
>> > > >  
>> > > > The cube desc consist of three parts:
>> > > > 1. Cube Desc (json file)
>> > > > 2. Model Desc (json file)
>> > > > 3. Hive table schema
>> > > >  
>> > > > the first two files can be checked directly in kylin web, go to
>> "Cubes"  
>> > > > tab, click the cube, and checkout contents in "Json(Cube)" and
>> > > > "Json(Model)"
>> > > >  
>> > > >  
>> > > > On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma
>><ma...@apache.org>
>> > > wrote:  
>> > > >  
>> > > > > hi vipul
>> > > > >  
>> > > > > I'm looking into this
>> > > > >  
>> > > > > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <
>> > vipul.jhawar@gmail.com>
>> > > > > wrote:  
>> > > > >  
>> > > > >> hi  
>> > > > >>  
>> > > > >> Just wanted to check if someone has had a chance to look at
>>this  
>> > case.  
>> > > > >>  
>> > > > >> Thanks 
>> > > > >>  
>> > > > >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <
>> > > vipul.jhawar@gmail.com>
>> > > > >> wrote: 
>> > > > >>  
>> > > > >> > Hi  
>> > > > >> >  
>> > > > >> > Please let us know if you need more details on this as it is
>> 
>> > > affecting  
>> > > > >> the  
>> > > > >> > results and its not predictable.
>> > > > >> > We are on 0.7.2
>> > > > >> >  
>> > > > >> > Thanks
>> > > > >> >  
>> > > > >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_@
>> databuryat.com  
>> > >  
>> > > > >> wrote: 
>> > > > >> >  
>> > > > >> >> Hi, 
>> > > > >> >>  
>> > > > >> >> We've found issues while running some queries:
>> > > > >> >> they return inconsistent results, i.e. in some cases we
>>don't  
>> get  
>> > > any  
>> > > > >> >> rows, in some cases we get some rows but never all that we
>> > expected  
>> > > > to  
>> > > > >> get.  
>> > > > >> >>  
>> > > > >> >> I was able to pin-point the queries, so here're the cases:
>> > > > >> >>  
>> > > > >> >> 1. SELECT dim, measure FROM table WHERE partition = one
>> partition  
>> > > AND  
>> > > > >> dim  
>> > > > >> >> IN (a,b,c) GROUP BY dim
>> > > > >> >> We get results for a,c only
>> > > > >> >> http://i.imgur.com/SZu6f2E.png
>> > > > >> >>  
>> > > > >> >> 2. IN (b)
>> > > > >> >> We get results for b as expected
>> > > > >> >> http://i.imgur.com/8c8UMWj.png
>> > > > >> >>  
>> > > > >> >> 3. IN (a,b)
>> > > > >> >> We don't get any results
>> > > > >> >> http://i.imgur.com/qIepe8d.png
>> > > > >> >>  
>> > > > >> >> 4. IN (b,c)
>> > > > >> >> We get results for b,c as expected
>> > > > >> >> http://i.imgur.com/Qq6yuuS.png
>> > > > >> >>  
>> > > > >> >> We tried to run the queries with acceptPartial=false and
>>with  
>> > empty  
>> > > > >> cache  
>> > > > >> >> and the issues are still the same.
>> > > > >> >>  
>> > > > >> >> What should we do to debug this?
>> > > > >> >> What might be causing these issues?
>> > > > >> >  
>> > > > >> >  
>> > > > >> >  
>> > > > >>  
>> > > > >  
>> > > > >  
>> > > > >  
>> > > > > --  
>> > > > > Regards,
>> > > > >  
>> > > > > *Bin Mahone | 马洪宾*
>> > > > > Apache Kylin: http://kylin.io
>> > > > > Github: https://github.com/binmahone
>> > > > >  
>> > > >  
>> > > >  
>> > > >  
>> > > > --  
>> > > > Regards,  
>> > > >  
>> > > > *Bin Mahone | 马洪宾*
>> > > > Apache Kylin: http://kylin.io
>> > > > Github: https://github.com/binmahone
>> > > >  
>> > >  
>> >  
>> >  
>> >  
>> > --  
>> > Regards,  
>> >  
>> > *Bin Mahone | 马洪宾*
>> > Apache Kylin: http://kylin.io
>> > Github: https://github.com/binmahone
>> >  
>>  
>
>
>
>--  
>Regards,  
>
>*Bin Mahone | 马洪宾*
>Apache Kylin: http://kylin.io
>Github: https://github.com/binmahone  


Re: Inconsistent query results

Posted by Vadim Semenov <_...@databuryat.com>.
integer,

and the cube was built without using dictionaries.

On September 29, 2015 at 5:45:49 AM, hongbin ma (mahongbin@apache.org) wrote:

interesting observation  

what it the data type of the dimension?  

On Tue, Sep 29, 2015 at 5:40 PM, Vadim Semenov <_...@databuryat.com> wrote:  

> I found something:  
>  
> I executed a query on one partition with a filter on one dimension with  
> two values (70, 200) and got the following in the logs:  
> http://i.imgur.com/b3zthZ3.png  
>  
> You see the scan range and 0 rows in the result set.  
>  
> I tried the same scan in hbase shell and got 0 rows:  
> scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01  
> 2015-09-2670\x09\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01  
> 2015-09-26200\x09\x09\x09\x00"}  
>  
> then I swapped the start & stop row:  
> scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01  
> 2015-09-26200\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01  
> 2015-09-2670\x09\x09\x09\x09\x00"}  
> and got 659 rows.  
>  
> And just to confirm I changed the order of the stop & start row in  
> HBaseKeyRange and got the results:  
> http://i.imgur.com/mT0qI4I.png  
>  
> On September 28, 2015 at 11:19:41 PM, Li Yang (liyang@apache.org) wrote:  
> I too cannot reproduce this one. Tried IN() on the LSTG_SITE_ID column of  
> TEST_KYLIN_FACT (the test cube used by regression). Everything is good.  
> The query I used:  
>  
> select LSTG_SITE_ID, sum(price) as GMV  
> from test_kylin_fact  
> inner JOIN edw.test_cal_dt as test_cal_dt  
> ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt  
> where test_cal_dt.week_beg_dt between DATE '2013-09-01' and DATE  
> '2013-10-01' and LSTG_SITE_ID in (0, 3, 15, 23, 100)  
> group by LSTG_SITE_ID  
>  
> Meant to be similar to Vadim's query, has a date condition, and the  
> LSTG_SITE_ID is of int type. Tested many combination of the ID values, all  
> results are correct...  
>  
> Anyone else tried similar queries on 0.7 or 1.x releases?  
>  
>  
>  
> On Thu, Sep 24, 2015 at 1:28 PM, hongbin ma <ma...@apache.org> wrote:  
>  
> > if you could give some sample data only and make sure the issue can be  
> > reproduceable on the sample data  
> >  
> > On Thu, Sep 24, 2015 at 12:28 PM, vipul jhawar <vi...@gmail.com>  
> > wrote:  
> >  
> > > hi  
> > >  
> > > For the cube  
> > >  
> > > JSON model is  
> > > http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e  
> > > JSON cube is  
> > > http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d  
> > >  
> > > On minimal data for the cube, even if we generate it how will we share  
> it  
> > > with you ? or you want sample fact table.  
> > > I can show you the kylin query in play on our current cube which is  
> > hosted  
> > > if you want.  
> > >  
> > > Thanks  
> > >  
> > > On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org>  
> > wrote:  
> > >  
> > > > Hi,  
> > > >  
> > > > We cannot reproduce it with our test cases.  
> > > >  
> > > > However, we'd love to help to analyze the problem for you. If  
> possible,  
> > > can  
> > > > you please try to use a minimal cube definition(maybe with a little  
> > > sample  
> > > > data) that will pin-point the issue?  
> > > >  
> > > > The cube desc consist of three parts:  
> > > > 1. Cube Desc (json file)  
> > > > 2. Model Desc (json file)  
> > > > 3. Hive table schema  
> > > >  
> > > > the first two files can be checked directly in kylin web, go to  
> "Cubes"  
> > > > tab, click the cube, and checkout contents in "Json(Cube)" and  
> > > > "Json(Model)"  
> > > >  
> > > >  
> > > > On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org>  
> > > wrote:  
> > > >  
> > > > > hi vipul  
> > > > >  
> > > > > I'm looking into this  
> > > > >  
> > > > > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <  
> > vipul.jhawar@gmail.com>  
> > > > > wrote:  
> > > > >  
> > > > >> hi  
> > > > >>  
> > > > >> Just wanted to check if someone has had a chance to look at this  
> > case.  
> > > > >>  
> > > > >> Thanks  
> > > > >>  
> > > > >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <  
> > > vipul.jhawar@gmail.com>  
> > > > >> wrote:  
> > > > >>  
> > > > >> > Hi  
> > > > >> >  
> > > > >> > Please let us know if you need more details on this as it is  
> > > affecting  
> > > > >> the  
> > > > >> > results and its not predictable.  
> > > > >> > We are on 0.7.2  
> > > > >> >  
> > > > >> > Thanks  
> > > > >> >  
> > > > >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_@  
> databuryat.com  
> > >  
> > > > >> wrote:  
> > > > >> >  
> > > > >> >> Hi,  
> > > > >> >>  
> > > > >> >> We've found issues while running some queries:  
> > > > >> >> they return inconsistent results, i.e. in some cases we don't  
> get  
> > > any  
> > > > >> >> rows, in some cases we get some rows but never all that we  
> > expected  
> > > > to  
> > > > >> get.  
> > > > >> >>  
> > > > >> >> I was able to pin-point the queries, so here're the cases:  
> > > > >> >>  
> > > > >> >> 1. SELECT dim, measure FROM table WHERE partition = one  
> partition  
> > > AND  
> > > > >> dim  
> > > > >> >> IN (a,b,c) GROUP BY dim  
> > > > >> >> We get results for a,c only  
> > > > >> >> http://i.imgur.com/SZu6f2E.png  
> > > > >> >>  
> > > > >> >> 2. IN (b)  
> > > > >> >> We get results for b as expected  
> > > > >> >> http://i.imgur.com/8c8UMWj.png  
> > > > >> >>  
> > > > >> >> 3. IN (a,b)  
> > > > >> >> We don't get any results  
> > > > >> >> http://i.imgur.com/qIepe8d.png  
> > > > >> >>  
> > > > >> >> 4. IN (b,c)  
> > > > >> >> We get results for b,c as expected  
> > > > >> >> http://i.imgur.com/Qq6yuuS.png  
> > > > >> >>  
> > > > >> >> We tried to run the queries with acceptPartial=false and with  
> > empty  
> > > > >> cache  
> > > > >> >> and the issues are still the same.  
> > > > >> >>  
> > > > >> >> What should we do to debug this?  
> > > > >> >> What might be causing these issues?  
> > > > >> >  
> > > > >> >  
> > > > >> >  
> > > > >>  
> > > > >  
> > > > >  
> > > > >  
> > > > > --  
> > > > > Regards,  
> > > > >  
> > > > > *Bin Mahone | 马洪宾*  
> > > > > Apache Kylin: http://kylin.io  
> > > > > Github: https://github.com/binmahone  
> > > > >  
> > > >  
> > > >  
> > > >  
> > > > --  
> > > > Regards,  
> > > >  
> > > > *Bin Mahone | 马洪宾*  
> > > > Apache Kylin: http://kylin.io  
> > > > Github: https://github.com/binmahone  
> > > >  
> > >  
> >  
> >  
> >  
> > --  
> > Regards,  
> >  
> > *Bin Mahone | 马洪宾*  
> > Apache Kylin: http://kylin.io  
> > Github: https://github.com/binmahone  
> >  
>  



--  
Regards,  

*Bin Mahone | 马洪宾*  
Apache Kylin: http://kylin.io  
Github: https://github.com/binmahone  

Re: Inconsistent query results

Posted by hongbin ma <ma...@apache.org>.
interesting observation

what it the data type of the dimension?

On Tue, Sep 29, 2015 at 5:40 PM, Vadim Semenov <_...@databuryat.com> wrote:

> I found something:
>
> I executed a query on one partition with a filter on one dimension with
> two values (70, 200) and got the following in the logs:
> http://i.imgur.com/b3zthZ3.png
>
> You see the scan range and 0 rows in the result set.
>
> I tried the same scan in hbase shell and got 0 rows:
> scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01
> 2015-09-2670\x09\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01
> 2015-09-26200\x09\x09\x09\x00"}
>
> then I swapped the start & stop row:
> scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01
> 2015-09-26200\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01
> 2015-09-2670\x09\x09\x09\x09\x00"}
> and got 659 rows.
>
> And just to confirm I changed the order of the stop & start row in
> HBaseKeyRange and got the results:
> http://i.imgur.com/mT0qI4I.png
>
> On September 28, 2015 at 11:19:41 PM, Li Yang (liyang@apache.org) wrote:
> I too cannot reproduce this one. Tried IN() on the LSTG_SITE_ID column of
> TEST_KYLIN_FACT (the test cube used by regression). Everything is good.
> The query I used:
>
> select LSTG_SITE_ID, sum(price) as GMV
> from test_kylin_fact
> inner JOIN edw.test_cal_dt as test_cal_dt
> ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt
> where test_cal_dt.week_beg_dt between DATE '2013-09-01' and DATE
> '2013-10-01' and LSTG_SITE_ID in (0, 3, 15, 23, 100)
> group by LSTG_SITE_ID
>
> Meant to be similar to Vadim's query, has a date condition, and the
> LSTG_SITE_ID is of int type. Tested many combination of the ID values, all
> results are correct...
>
> Anyone else tried similar queries on 0.7 or 1.x releases?
>
>
>
> On Thu, Sep 24, 2015 at 1:28 PM, hongbin ma <ma...@apache.org> wrote:
>
> > if you could give some sample data only and make sure the issue can be
> > reproduceable on the sample data
> >
> > On Thu, Sep 24, 2015 at 12:28 PM, vipul jhawar <vi...@gmail.com>
> > wrote:
> >
> > > hi
> > >
> > > For the cube
> > >
> > > JSON model is
> > > http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e
> > > JSON cube is
> > > http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d
> > >
> > > On minimal data for the cube, even if we generate it how will we share
> it
> > > with you ? or you want sample fact table.
> > > I can show you the kylin query in play on our current cube which is
> > hosted
> > > if you want.
> > >
> > > Thanks
> > >
> > > On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We cannot reproduce it with our test cases.
> > > >
> > > > However, we'd love to help to analyze the problem for you. If
> possible,
> > > can
> > > > you please try to use a minimal cube definition(maybe with a little
> > > sample
> > > > data) that will pin-point the issue?
> > > >
> > > > The cube desc consist of three parts:
> > > > 1. Cube Desc (json file)
> > > > 2. Model Desc (json file)
> > > > 3. Hive table schema
> > > >
> > > > the first two files can be checked directly in kylin web, go to
> "Cubes"
> > > > tab, click the cube, and checkout contents in "Json(Cube)" and
> > > > "Json(Model)"
> > > >
> > > >
> > > > On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org>
> > > wrote:
> > > >
> > > > > hi vipul
> > > > >
> > > > > I'm looking into this
> > > > >
> > > > > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <
> > vipul.jhawar@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> hi
> > > > >>
> > > > >> Just wanted to check if someone has had a chance to look at this
> > case.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <
> > > vipul.jhawar@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi
> > > > >> >
> > > > >> > Please let us know if you need more details on this as it is
> > > affecting
> > > > >> the
> > > > >> > results and its not predictable.
> > > > >> > We are on 0.7.2
> > > > >> >
> > > > >> > Thanks
> > > > >> >
> > > > >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_@
> databuryat.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> Hi,
> > > > >> >>
> > > > >> >> We've found issues while running some queries:
> > > > >> >> they return inconsistent results, i.e. in some cases we don't
> get
> > > any
> > > > >> >> rows, in some cases we get some rows but never all that we
> > expected
> > > > to
> > > > >> get.
> > > > >> >>
> > > > >> >> I was able to pin-point the queries, so here're the cases:
> > > > >> >>
> > > > >> >> 1. SELECT dim, measure FROM table WHERE partition = one
> partition
> > > AND
> > > > >> dim
> > > > >> >> IN (a,b,c) GROUP BY dim
> > > > >> >> We get results for a,c only
> > > > >> >> http://i.imgur.com/SZu6f2E.png
> > > > >> >>
> > > > >> >> 2. IN (b)
> > > > >> >> We get results for b as expected
> > > > >> >> http://i.imgur.com/8c8UMWj.png
> > > > >> >>
> > > > >> >> 3. IN (a,b)
> > > > >> >> We don't get any results
> > > > >> >> http://i.imgur.com/qIepe8d.png
> > > > >> >>
> > > > >> >> 4. IN (b,c)
> > > > >> >> We get results for b,c as expected
> > > > >> >> http://i.imgur.com/Qq6yuuS.png
> > > > >> >>
> > > > >> >> We tried to run the queries with acceptPartial=false and with
> > empty
> > > > >> cache
> > > > >> >> and the issues are still the same.
> > > > >> >>
> > > > >> >> What should we do to debug this?
> > > > >> >> What might be causing these issues?
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > *Bin Mahone | 马洪宾*
> > > > > Apache Kylin: http://kylin.io
> > > > > Github: https://github.com/binmahone
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > *Bin Mahone | 马洪宾*
> > > > Apache Kylin: http://kylin.io
> > > > Github: https://github.com/binmahone
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Inconsistent query results

Posted by Vadim Semenov <_...@databuryat.com>.
I found something:

I executed a query on one partition with a filter on one dimension with two values (70, 200) and got the following in the logs:
http://i.imgur.com/b3zthZ3.png

You see the scan range and 0 rows in the result set.

I tried the same scan in hbase shell and got 0 rows:
scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01 2015-09-2670\x09\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01 2015-09-26200\x09\x09\x09\x00"}

then I swapped the start & stop row:
scan 'KYLIN_JH6RCA65S7', {STARTROW => "\x00\x00\x00\x00\x00\x00\x01 2015-09-26200\x09\x09\x09", STOPROW => "\x00\x00\x00\x00\x00\x00\x01 2015-09-2670\x09\x09\x09\x09\x00"}
and got 659 rows.

And just to confirm I changed the order of the stop & start row in HBaseKeyRange and got the results:
http://i.imgur.com/mT0qI4I.png

On September 28, 2015 at 11:19:41 PM, Li Yang (liyang@apache.org) wrote:
I too cannot reproduce this one. Tried IN() on the LSTG_SITE_ID column of  
TEST_KYLIN_FACT (the test cube used by regression). Everything is good.  
The query I used:  

select LSTG_SITE_ID, sum(price) as GMV  
from test_kylin_fact  
inner JOIN edw.test_cal_dt as test_cal_dt  
ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt  
where test_cal_dt.week_beg_dt between DATE '2013-09-01' and DATE  
'2013-10-01' and LSTG_SITE_ID in (0, 3, 15, 23, 100)  
group by LSTG_SITE_ID  

Meant to be similar to Vadim's query, has a date condition, and the  
LSTG_SITE_ID is of int type. Tested many combination of the ID values, all  
results are correct...  

Anyone else tried similar queries on 0.7 or 1.x releases?  



On Thu, Sep 24, 2015 at 1:28 PM, hongbin ma <ma...@apache.org> wrote:  

> if you could give some sample data only and make sure the issue can be  
> reproduceable on the sample data  
>  
> On Thu, Sep 24, 2015 at 12:28 PM, vipul jhawar <vi...@gmail.com>  
> wrote:  
>  
> > hi  
> >  
> > For the cube  
> >  
> > JSON model is  
> > http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e  
> > JSON cube is  
> > http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d  
> >  
> > On minimal data for the cube, even if we generate it how will we share it  
> > with you ? or you want sample fact table.  
> > I can show you the kylin query in play on our current cube which is  
> hosted  
> > if you want.  
> >  
> > Thanks  
> >  
> > On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org>  
> wrote:  
> >  
> > > Hi,  
> > >  
> > > We cannot reproduce it with our test cases.  
> > >  
> > > However, we'd love to help to analyze the problem for you. If possible,  
> > can  
> > > you please try to use a minimal cube definition(maybe with a little  
> > sample  
> > > data) that will pin-point the issue?  
> > >  
> > > The cube desc consist of three parts:  
> > > 1. Cube Desc (json file)  
> > > 2. Model Desc (json file)  
> > > 3. Hive table schema  
> > >  
> > > the first two files can be checked directly in kylin web, go to "Cubes"  
> > > tab, click the cube, and checkout contents in "Json(Cube)" and  
> > > "Json(Model)"  
> > >  
> > >  
> > > On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org>  
> > wrote:  
> > >  
> > > > hi vipul  
> > > >  
> > > > I'm looking into this  
> > > >  
> > > > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <  
> vipul.jhawar@gmail.com>  
> > > > wrote:  
> > > >  
> > > >> hi  
> > > >>  
> > > >> Just wanted to check if someone has had a chance to look at this  
> case.  
> > > >>  
> > > >> Thanks  
> > > >>  
> > > >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <  
> > vipul.jhawar@gmail.com>  
> > > >> wrote:  
> > > >>  
> > > >> > Hi  
> > > >> >  
> > > >> > Please let us know if you need more details on this as it is  
> > affecting  
> > > >> the  
> > > >> > results and its not predictable.  
> > > >> > We are on 0.7.2  
> > > >> >  
> > > >> > Thanks  
> > > >> >  
> > > >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_@databuryat.com  
> >  
> > > >> wrote:  
> > > >> >  
> > > >> >> Hi,  
> > > >> >>  
> > > >> >> We've found issues while running some queries:  
> > > >> >> they return inconsistent results, i.e. in some cases we don't get  
> > any  
> > > >> >> rows, in some cases we get some rows but never all that we  
> expected  
> > > to  
> > > >> get.  
> > > >> >>  
> > > >> >> I was able to pin-point the queries, so here're the cases:  
> > > >> >>  
> > > >> >> 1. SELECT dim, measure FROM table WHERE partition = one partition  
> > AND  
> > > >> dim  
> > > >> >> IN (a,b,c) GROUP BY dim  
> > > >> >> We get results for a,c only  
> > > >> >> http://i.imgur.com/SZu6f2E.png  
> > > >> >>  
> > > >> >> 2. IN (b)  
> > > >> >> We get results for b as expected  
> > > >> >> http://i.imgur.com/8c8UMWj.png  
> > > >> >>  
> > > >> >> 3. IN (a,b)  
> > > >> >> We don't get any results  
> > > >> >> http://i.imgur.com/qIepe8d.png  
> > > >> >>  
> > > >> >> 4. IN (b,c)  
> > > >> >> We get results for b,c as expected  
> > > >> >> http://i.imgur.com/Qq6yuuS.png  
> > > >> >>  
> > > >> >> We tried to run the queries with acceptPartial=false and with  
> empty  
> > > >> cache  
> > > >> >> and the issues are still the same.  
> > > >> >>  
> > > >> >> What should we do to debug this?  
> > > >> >> What might be causing these issues?  
> > > >> >  
> > > >> >  
> > > >> >  
> > > >>  
> > > >  
> > > >  
> > > >  
> > > > --  
> > > > Regards,  
> > > >  
> > > > *Bin Mahone | 马洪宾*  
> > > > Apache Kylin: http://kylin.io  
> > > > Github: https://github.com/binmahone  
> > > >  
> > >  
> > >  
> > >  
> > > --  
> > > Regards,  
> > >  
> > > *Bin Mahone | 马洪宾*  
> > > Apache Kylin: http://kylin.io  
> > > Github: https://github.com/binmahone  
> > >  
> >  
>  
>  
>  
> --  
> Regards,  
>  
> *Bin Mahone | 马洪宾*  
> Apache Kylin: http://kylin.io  
> Github: https://github.com/binmahone  
>  

Re: Inconsistent query results

Posted by Li Yang <li...@apache.org>.
I too cannot reproduce this one.  Tried IN() on the LSTG_SITE_ID column of
TEST_KYLIN_FACT (the test cube used by regression). Everything is good.
The query I used:

select LSTG_SITE_ID, sum(price) as GMV
 from test_kylin_fact
inner JOIN edw.test_cal_dt as test_cal_dt
 ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt
 where test_cal_dt.week_beg_dt between DATE '2013-09-01' and DATE
'2013-10-01' and LSTG_SITE_ID in (0, 3, 15, 23, 100)
 group by LSTG_SITE_ID

Meant to be similar to Vadim's query, has a date condition, and the
LSTG_SITE_ID is of int type. Tested many combination of the ID values, all
results are correct...

Anyone else tried similar queries on 0.7 or 1.x releases?



On Thu, Sep 24, 2015 at 1:28 PM, hongbin ma <ma...@apache.org> wrote:

> if you could give some sample data only and make sure the issue can be
> reproduceable on the sample data
>
> On Thu, Sep 24, 2015 at 12:28 PM, vipul jhawar <vi...@gmail.com>
> wrote:
>
> > hi
> >
> > For the cube
> >
> > JSON model is
> > http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e
> > JSON cube is
> > http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d
> >
> > On minimal data for the cube, even if we generate it how will we share it
> > with you ? or you want sample fact table.
> > I can show you the kylin query in play on our current cube which is
> hosted
> > if you want.
> >
> > Thanks
> >
> > On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org>
> wrote:
> >
> > > Hi,
> > >
> > > We cannot reproduce it with our test cases.
> > >
> > > However, we'd love to help to analyze the problem for you. If possible,
> > can
> > > you please try to use a minimal cube definition(maybe with a little
> > sample
> > > data) that will pin-point the issue?
> > >
> > > The cube desc consist of three parts:
> > > 1. Cube Desc (json file)
> > > 2. Model Desc (json file)
> > > 3. Hive table schema
> > >
> > > the first two files can be checked directly in kylin web, go to "Cubes"
> > > tab, click the cube, and checkout contents in "Json(Cube)" and
> > > "Json(Model)"
> > >
> > >
> > > On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org>
> > wrote:
> > >
> > > > hi vipul
> > > >
> > > > I'm looking into this
> > > >
> > > > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <
> vipul.jhawar@gmail.com>
> > > > wrote:
> > > >
> > > >> hi
> > > >>
> > > >> Just wanted to check if someone has had a chance to look at this
> case.
> > > >>
> > > >> Thanks
> > > >>
> > > >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <
> > vipul.jhawar@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi
> > > >> >
> > > >> > Please let us know if you need more details on this as it is
> > affecting
> > > >> the
> > > >> > results and its not predictable.
> > > >> > We are on 0.7.2
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_@databuryat.com
> >
> > > >> wrote:
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> We've found issues while running some queries:
> > > >> >> they return inconsistent results, i.e. in some cases we don't get
> > any
> > > >> >> rows, in some cases we get some rows but never all that we
> expected
> > > to
> > > >> get.
> > > >> >>
> > > >> >> I was able to pin-point the queries, so here're the cases:
> > > >> >>
> > > >> >> 1. SELECT dim, measure FROM table WHERE partition = one partition
> > AND
> > > >> dim
> > > >> >> IN (a,b,c) GROUP BY dim
> > > >> >> We get results for a,c only
> > > >> >> http://i.imgur.com/SZu6f2E.png
> > > >> >>
> > > >> >> 2. IN (b)
> > > >> >> We get results for b as expected
> > > >> >> http://i.imgur.com/8c8UMWj.png
> > > >> >>
> > > >> >> 3. IN (a,b)
> > > >> >> We don't get any results
> > > >> >> http://i.imgur.com/qIepe8d.png
> > > >> >>
> > > >> >> 4. IN (b,c)
> > > >> >> We get results for b,c as expected
> > > >> >> http://i.imgur.com/Qq6yuuS.png
> > > >> >>
> > > >> >> We tried to run the queries with acceptPartial=false and with
> empty
> > > >> cache
> > > >> >> and the issues are still the same.
> > > >> >>
> > > >> >> What should we do to debug this?
> > > >> >> What might be causing these issues?
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > *Bin Mahone | 马洪宾*
> > > > Apache Kylin: http://kylin.io
> > > > Github: https://github.com/binmahone
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Inconsistent query results

Posted by hongbin ma <ma...@apache.org>.
if you could give some sample data only and make sure the issue can be
reproduceable on the sample data

On Thu, Sep 24, 2015 at 12:28 PM, vipul jhawar <vi...@gmail.com>
wrote:

> hi
>
> For the cube
>
> JSON model is
> http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e
> JSON cube is
> http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d
>
> On minimal data for the cube, even if we generate it how will we share it
> with you ? or you want sample fact table.
> I can show you the kylin query in play on our current cube which is hosted
> if you want.
>
> Thanks
>
> On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org> wrote:
>
> > Hi,
> >
> > We cannot reproduce it with our test cases.
> >
> > However, we'd love to help to analyze the problem for you. If possible,
> can
> > you please try to use a minimal cube definition(maybe with a little
> sample
> > data) that will pin-point the issue?
> >
> > The cube desc consist of three parts:
> > 1. Cube Desc (json file)
> > 2. Model Desc (json file)
> > 3. Hive table schema
> >
> > the first two files can be checked directly in kylin web, go to "Cubes"
> > tab, click the cube, and checkout contents in "Json(Cube)" and
> > "Json(Model)"
> >
> >
> > On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org>
> wrote:
> >
> > > hi vipul
> > >
> > > I'm looking into this
> > >
> > > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <vi...@gmail.com>
> > > wrote:
> > >
> > >> hi
> > >>
> > >> Just wanted to check if someone has had a chance to look at this case.
> > >>
> > >> Thanks
> > >>
> > >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <
> vipul.jhawar@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi
> > >> >
> > >> > Please let us know if you need more details on this as it is
> affecting
> > >> the
> > >> > results and its not predictable.
> > >> > We are on 0.7.2
> > >> >
> > >> > Thanks
> > >> >
> > >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_...@databuryat.com>
> > >> wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> We've found issues while running some queries:
> > >> >> they return inconsistent results, i.e. in some cases we don't get
> any
> > >> >> rows, in some cases we get some rows but never all that we expected
> > to
> > >> get.
> > >> >>
> > >> >> I was able to pin-point the queries, so here're the cases:
> > >> >>
> > >> >> 1. SELECT dim, measure FROM table WHERE partition = one partition
> AND
> > >> dim
> > >> >> IN (a,b,c) GROUP BY dim
> > >> >> We get results for a,c only
> > >> >> http://i.imgur.com/SZu6f2E.png
> > >> >>
> > >> >> 2. IN (b)
> > >> >> We get results for b as expected
> > >> >> http://i.imgur.com/8c8UMWj.png
> > >> >>
> > >> >> 3. IN (a,b)
> > >> >> We don't get any results
> > >> >> http://i.imgur.com/qIepe8d.png
> > >> >>
> > >> >> 4. IN (b,c)
> > >> >> We get results for b,c as expected
> > >> >> http://i.imgur.com/Qq6yuuS.png
> > >> >>
> > >> >> We tried to run the queries with acceptPartial=false and with empty
> > >> cache
> > >> >> and the issues are still the same.
> > >> >>
> > >> >> What should we do to debug this?
> > >> >> What might be causing these issues?
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Inconsistent query results

Posted by vipul jhawar <vi...@gmail.com>.
hi

For the cube

JSON model is
http://www.jsoneditoronline.org/?id=1da42fe018b14c7522d3937ba81cd37e
JSON cube is
http://www.jsoneditoronline.org/?id=97008080361a2388888acd128004753d

On minimal data for the cube, even if we generate it how will we share it
with you ? or you want sample fact table.
I can show you the kylin query in play on our current cube which is hosted
if you want.

Thanks

On Wed, Sep 23, 2015 at 1:54 PM, hongbin ma <ma...@apache.org> wrote:

> Hi,
>
> We cannot reproduce it with our test cases.
>
> However, we'd love to help to analyze the problem for you. If possible, can
> you please try to use a minimal cube definition(maybe with a little sample
> data) that will pin-point the issue?
>
> The cube desc consist of three parts:
> 1. Cube Desc (json file)
> 2. Model Desc (json file)
> 3. Hive table schema
>
> the first two files can be checked directly in kylin web, go to "Cubes"
> tab, click the cube, and checkout contents in "Json(Cube)" and
> "Json(Model)"
>
>
> On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org> wrote:
>
> > hi vipul
> >
> > I'm looking into this
> >
> > On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <vi...@gmail.com>
> > wrote:
> >
> >> hi
> >>
> >> Just wanted to check if someone has had a chance to look at this case.
> >>
> >> Thanks
> >>
> >> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <vi...@gmail.com>
> >> wrote:
> >>
> >> > Hi
> >> >
> >> > Please let us know if you need more details on this as it is affecting
> >> the
> >> > results and its not predictable.
> >> > We are on 0.7.2
> >> >
> >> > Thanks
> >> >
> >> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_...@databuryat.com>
> >> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> We've found issues while running some queries:
> >> >> they return inconsistent results, i.e. in some cases we don't get any
> >> >> rows, in some cases we get some rows but never all that we expected
> to
> >> get.
> >> >>
> >> >> I was able to pin-point the queries, so here're the cases:
> >> >>
> >> >> 1. SELECT dim, measure FROM table WHERE partition = one partition AND
> >> dim
> >> >> IN (a,b,c) GROUP BY dim
> >> >> We get results for a,c only
> >> >> http://i.imgur.com/SZu6f2E.png
> >> >>
> >> >> 2. IN (b)
> >> >> We get results for b as expected
> >> >> http://i.imgur.com/8c8UMWj.png
> >> >>
> >> >> 3. IN (a,b)
> >> >> We don't get any results
> >> >> http://i.imgur.com/qIepe8d.png
> >> >>
> >> >> 4. IN (b,c)
> >> >> We get results for b,c as expected
> >> >> http://i.imgur.com/Qq6yuuS.png
> >> >>
> >> >> We tried to run the queries with acceptPartial=false and with empty
> >> cache
> >> >> and the issues are still the same.
> >> >>
> >> >> What should we do to debug this?
> >> >> What might be causing these issues?
> >> >
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Inconsistent query results

Posted by hongbin ma <ma...@apache.org>.
Hi,

We cannot reproduce it with our test cases.

However, we'd love to help to analyze the problem for you. If possible, can
you please try to use a minimal cube definition(maybe with a little sample
data) that will pin-point the issue?

The cube desc consist of three parts:
1. Cube Desc (json file)
2. Model Desc (json file)
3. Hive table schema

the first two files can be checked directly in kylin web, go to "Cubes"
tab, click the cube, and checkout contents in "Json(Cube)" and "Json(Model)"


On Wed, Sep 23, 2015 at 3:11 PM, hongbin ma <ma...@apache.org> wrote:

> hi vipul
>
> I'm looking into this
>
> On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <vi...@gmail.com>
> wrote:
>
>> hi
>>
>> Just wanted to check if someone has had a chance to look at this case.
>>
>> Thanks
>>
>> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <vi...@gmail.com>
>> wrote:
>>
>> > Hi
>> >
>> > Please let us know if you need more details on this as it is affecting
>> the
>> > results and its not predictable.
>> > We are on 0.7.2
>> >
>> > Thanks
>> >
>> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_...@databuryat.com>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> We've found issues while running some queries:
>> >> they return inconsistent results, i.e. in some cases we don't get any
>> >> rows, in some cases we get some rows but never all that we expected to
>> get.
>> >>
>> >> I was able to pin-point the queries, so here're the cases:
>> >>
>> >> 1. SELECT dim, measure FROM table WHERE partition = one partition AND
>> dim
>> >> IN (a,b,c) GROUP BY dim
>> >> We get results for a,c only
>> >> http://i.imgur.com/SZu6f2E.png
>> >>
>> >> 2. IN (b)
>> >> We get results for b as expected
>> >> http://i.imgur.com/8c8UMWj.png
>> >>
>> >> 3. IN (a,b)
>> >> We don't get any results
>> >> http://i.imgur.com/qIepe8d.png
>> >>
>> >> 4. IN (b,c)
>> >> We get results for b,c as expected
>> >> http://i.imgur.com/Qq6yuuS.png
>> >>
>> >> We tried to run the queries with acceptPartial=false and with empty
>> cache
>> >> and the issues are still the same.
>> >>
>> >> What should we do to debug this?
>> >> What might be causing these issues?
>> >
>> >
>> >
>>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Inconsistent query results

Posted by hongbin ma <ma...@apache.org>.
hi vipul

I'm looking into this

On Wed, Sep 23, 2015 at 3:10 PM, vipul jhawar <vi...@gmail.com>
wrote:

> hi
>
> Just wanted to check if someone has had a chance to look at this case.
>
> Thanks
>
> On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <vi...@gmail.com>
> wrote:
>
> > Hi
> >
> > Please let us know if you need more details on this as it is affecting
> the
> > results and its not predictable.
> > We are on 0.7.2
> >
> > Thanks
> >
> > On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_...@databuryat.com>
> wrote:
> >
> >> Hi,
> >>
> >> We've found issues while running some queries:
> >> they return inconsistent results, i.e. in some cases we don't get any
> >> rows, in some cases we get some rows but never all that we expected to
> get.
> >>
> >> I was able to pin-point the queries, so here're the cases:
> >>
> >> 1. SELECT dim, measure FROM table WHERE partition = one partition AND
> dim
> >> IN (a,b,c) GROUP BY dim
> >> We get results for a,c only
> >> http://i.imgur.com/SZu6f2E.png
> >>
> >> 2. IN (b)
> >> We get results for b as expected
> >> http://i.imgur.com/8c8UMWj.png
> >>
> >> 3. IN (a,b)
> >> We don't get any results
> >> http://i.imgur.com/qIepe8d.png
> >>
> >> 4. IN (b,c)
> >> We get results for b,c as expected
> >> http://i.imgur.com/Qq6yuuS.png
> >>
> >> We tried to run the queries with acceptPartial=false and with empty
> cache
> >> and the issues are still the same.
> >>
> >> What should we do to debug this?
> >> What might be causing these issues?
> >
> >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Inconsistent query results

Posted by vipul jhawar <vi...@gmail.com>.
hi

Just wanted to check if someone has had a chance to look at this case.

Thanks

On Tue, Sep 22, 2015 at 10:33 PM, vipul jhawar <vi...@gmail.com>
wrote:

> Hi
>
> Please let us know if you need more details on this as it is affecting the
> results and its not predictable.
> We are on 0.7.2
>
> Thanks
>
> On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_...@databuryat.com> wrote:
>
>> Hi,
>>
>> We've found issues while running some queries:
>> they return inconsistent results, i.e. in some cases we don't get any
>> rows, in some cases we get some rows but never all that we expected to get.
>>
>> I was able to pin-point the queries, so here're the cases:
>>
>> 1. SELECT dim, measure FROM table WHERE partition = one partition AND dim
>> IN (a,b,c) GROUP BY dim
>> We get results for a,c only
>> http://i.imgur.com/SZu6f2E.png
>>
>> 2. IN (b)
>> We get results for b as expected
>> http://i.imgur.com/8c8UMWj.png
>>
>> 3. IN (a,b)
>> We don't get any results
>> http://i.imgur.com/qIepe8d.png
>>
>> 4. IN (b,c)
>> We get results for b,c as expected
>> http://i.imgur.com/Qq6yuuS.png
>>
>> We tried to run the queries with acceptPartial=false and with empty cache
>> and the issues are still the same.
>>
>> What should we do to debug this?
>> What might be causing these issues?
>
>
>

Re: Inconsistent query results

Posted by vipul jhawar <vi...@gmail.com>.
Hi

Please let us know if you need more details on this as it is affecting the
results and its not predictable.
We are on 0.7.2

Thanks

On Tue, Sep 22, 2015 at 10:04 AM, Vadim Semenov <_...@databuryat.com> wrote:

> Hi,
>
> We've found issues while running some queries:
> they return inconsistent results, i.e. in some cases we don't get any
> rows, in some cases we get some rows but never all that we expected to get.
>
> I was able to pin-point the queries, so here're the cases:
>
> 1. SELECT dim, measure FROM table WHERE partition = one partition AND dim
> IN (a,b,c) GROUP BY dim
> We get results for a,c only
> http://i.imgur.com/SZu6f2E.png
>
> 2. IN (b)
> We get results for b as expected
> http://i.imgur.com/8c8UMWj.png
>
> 3. IN (a,b)
> We don't get any results
> http://i.imgur.com/qIepe8d.png
>
> 4. IN (b,c)
> We get results for b,c as expected
> http://i.imgur.com/Qq6yuuS.png
>
> We tried to run the queries with acceptPartial=false and with empty cache
> and the issues are still the same.
>
> What should we do to debug this?
> What might be causing these issues?