You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Sumit Nigam <su...@yahoo.com> on 2015/12/10 15:35:37 UTC

Help with LIMIT clause

Hi,
The link for salted tables https://phoenix.apache.org/salted.html mentions "Since salting table would not store the data sequentially, a strict sequential scan would not return all the data in the natural sorted fashion. Clauses that currently would force a sequential scan, for example, clauses with LIMIT, would likely to return items that are different from a normal table"
So, would a simple query such as SELECT CURRENT_TIMESTAMP FROM TBL ORDER BY CURRENT_TIMESTAMP DESC LIMIT 1; not really return the MAX(CURRENT_TIMESTAMP) ?
PK is on 2 columns with CURRENT_TIMESTAMP as the leading one. I am noticing this issue:
select current_timestamp from TBL order by current_timestamp desc limit 1;+------------------------------------------+|            CURRENT_TIMESTAMP             |+------------------------------------------+| 1448815328556                            |+------------------------------------------+

select max(current_timestamp) from TBL;+------------------------------------------+|         MAX("CURRENT_TIMESTAMP")         |+------------------------------------------+| 1449732792090                            |+------------------------------------------+
The results are different. MAX is of course, returning the right record.
If this is the case, then what should be done where LIMIT is really to be used? What can I replace it with to get the desired behavior?
Is this also correct that when there is a WHERE clause limiting the number of projected records, then LIMIT seems to work fine? I seem to be noticing that also.
This is with hbase 0.98.14 and phoenix 4.5.x
Thanks,Sumit

Re: Help with LIMIT clause

Posted by Sumit Nigam <su...@yahoo.com>.
Thank you James. I have filed https://issues.apache.org/jira/browse/PHOENIX-2514
Best regards,Sumit
      From: James Taylor <ja...@apache.org>
 To: "user@phoenix.apache.org" <us...@phoenix.apache.org>; Sumit Nigam <su...@yahoo.com> 
 Sent: Friday, December 11, 2015 9:06 AM
 Subject: Re: Help with LIMIT clause
   
Thanks - most helpful would be a complete test case that reproduces it. Would be helpful if you tried against 4.6 and/or master.



On Thursday, December 10, 2015, Sumit Nigam <su...@yahoo.com> wrote:

Thank you James.
I am using Phoenix 4.5.1 with HBase-0.98.14.
I am also noticing that if WHERE clause returns a fewer number of records, then ORDER BY with LIMIT works fine. Does this input help in any way?
I will file a CR.
Thanks again,Sumit
 
Hi Sumit,I agree, these two queries should return the same result, as long as you have the ORDER BY clause. What version of Phoenix are you using? What does your DDL look like? Please file a JIRA that ideally includes a way of reproducing the issue.
select current_timestamp from TBL order by current_timestamp desc limit 1;select max(current_timestamp) from TBL;

Thanks,James



On Thu, Dec 10, 2015 at 8:58 AM, Sumit Nigam <su...@yahoo.com> wrote:

In thinking a bit more about it, this should be a bug in Phoenix. This is because even with LIMIT clause I have a order by timestamp DESC, which means that column values MUST have been sorted prior to applying LIMIT clause. The LIMIT should then give a MAX value in such a case. Also, surprisingly this only seems to be failing in cases where there are huge number of records. Like, in my case I have 200K + records.
Any help will be appreciated.
Thanks,Sumit
      From: Sumit Nigam <su...@yahoo.com>
 To: Users Mail List Phoenix <us...@phoenix.apache.org> 
 Sent: Thursday, December 10, 2015 8:05 PM
 Subject: Help with LIMIT clause
  
Hi,
The link for salted tables https://phoenix.apache.org/salted.html mentions "Since salting table would not store the data sequentially, a strict sequential scan would not return all the data in the natural sorted fashion. Clauses that currently would force a sequential scan, for example, clauses with LIMIT, would likely to return items that are different from a normal table"
So, would a simple query such as SELECT CURRENT_TIMESTAMP FROM TBL ORDER BY CURRENT_TIMESTAMP DESC LIMIT 1; not really return the MAX(CURRENT_TIMESTAMP) ?
PK is on 2 columns with CURRENT_TIMESTAMP as the leading one. I am noticing this issue:
select current_timestamp from TBL order by current_timestamp desc limit 1;+------------------------------------------+|            CURRENT_TIMESTAMP             |+------------------------------------------+| 1448815328556                            |+------------------------------------------+

select max(current_timestamp) from TBL;+------------------------------------------+|         MAX("CURRENT_TIMESTAMP")         |+------------------------------------------+| 1449732792090                            |+------------------------------------------+
The results are different. MAX is of course, returning the right record.
If this is the case, then what should be done where LIMIT is really to be used? What can I replace it with to get the desired behavior?
Is this also correct that when there is a WHERE clause limiting the number of projected records, then LIMIT seems to work fine? I seem to be noticing that also.
This is with hbase 0.98.14 and phoenix 4.5.x
Thanks,Sumit

  



  


 

Re: Help with LIMIT clause

Posted by James Taylor <ja...@apache.org>.
Thanks - most helpful would be a complete test case that reproduces it.
Would be helpful if you tried against 4.6 and/or master.

On Thursday, December 10, 2015, Sumit Nigam <su...@yahoo.com> wrote:

> Thank you James.
>
> I am using Phoenix 4.5.1 with HBase-0.98.14.
>
> I am also noticing that if WHERE clause returns a fewer number of records,
> then ORDER BY with LIMIT works fine. Does this input help in any way?
>
> I will file a CR.
>
> Thanks again,
> Sumit
>
>
> Hi Sumit,
> I agree, these two queries should return the same result, as long as you
> have the ORDER BY clause. What version of Phoenix are you using? What does
> your DDL look like? Please file a JIRA that ideally includes a way of
> reproducing the issue.
>
> select current_timestamp from TBL order by current_timestamp desc limit 1;
> select max(current_timestamp) from TBL;
>
> Thanks,
> James
>
>
>
>
> On Thu, Dec 10, 2015 at 8:58 AM, Sumit Nigam <sumit_only@yahoo.com
> <javascript:_e(%7B%7D,'cvml','sumit_only@yahoo.com');>> wrote:
>
> In thinking a bit more about it, this should be a bug in Phoenix. This is
> because even with LIMIT clause I have a order by timestamp DESC, which
> means that column values MUST have been sorted prior to applying LIMIT
> clause. The LIMIT should then give a MAX value in such a case. Also,
> surprisingly this only seems to be failing in cases where there are huge
> number of records. Like, in my case I have 200K + records.
>
> Any help will be appreciated.
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Sumit Nigam <sumit_only@yahoo.com
> <javascript:_e(%7B%7D,'cvml','sumit_only@yahoo.com');>>
> *To:* Users Mail List Phoenix <user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>>
> *Sent:* Thursday, December 10, 2015 8:05 PM
> *Subject:* Help with LIMIT clause
>
> Hi,
>
> The link for salted tables https://phoenix.apache.org/salted.html mentions
> "*Since salting table would not store the data sequentially, a strict
> sequential scan would not return all the data in the natural sorted
> fashion. Clauses that currently would force a sequential scan, for example,
> clauses with LIMIT, would likely to return items that are different from a
> normal table*"
>
> So, would a simple query such as *SELECT CURRENT_TIMESTAMP FROM TBL ORDER
> BY CURRENT_TIMESTAMP DESC LIMIT 1*; not really return the *MAX*(
> *CURRENT_TIMESTAMP*) ?
>
> PK is on 2 columns with *CURRENT_TIMESTAMP *as the leading one. I am
> noticing this issue:
>
> *select current_timestamp from TBL order by current_timestamp desc limit
> 1;*
> +------------------------------------------+
> |            CURRENT_TIMESTAMP             |
> +------------------------------------------+
> | 1448815328556                            |
> +------------------------------------------+
>
>
> *select max(current_timestamp) from TBL;*
> +------------------------------------------+
> |         MAX("CURRENT_TIMESTAMP")         |
> +------------------------------------------+
> | 1449732792090                            |
> +------------------------------------------+
>
> The results are different. MAX is of course, returning the right record.
>
> If this is the case, then what should be done where LIMIT is really to be
> used? What can I replace it with to get the desired behavior?
>
> Is this also correct that when there is a WHERE clause limiting the number
> of projected records, then LIMIT seems to work fine? I seem to be noticing
> that also.
>
> This is with hbase 0.98.14 and phoenix 4.5.x
>
> Thanks,
> Sumit
>
>
>
>
>
>

Re: Help with LIMIT clause

Posted by Sumit Nigam <su...@yahoo.com>.
Thank you James.
I am using Phoenix 4.5.1 with HBase-0.98.14.
I am also noticing that if WHERE clause returns a fewer number of records, then ORDER BY with LIMIT works fine. Does this input help in any way?
I will file a CR.
Thanks again,Sumit
      From: James Taylor <ja...@apache.org>
 To: user <us...@phoenix.apache.org>; Sumit Nigam <su...@yahoo.com> 
 Sent: Thursday, December 10, 2015 11:34 PM
 Subject: Re: Help with LIMIT clause
   
Hi Sumit,I agree, these two queries should return the same result, as long as you have the ORDER BY clause. What version of Phoenix are you using? What does your DDL look like? Please file a JIRA that ideally includes a way of reproducing the issue.
select current_timestamp from TBL order by current_timestamp desc limit 1;select max(current_timestamp) from TBL;

Thanks,James



On Thu, Dec 10, 2015 at 8:58 AM, Sumit Nigam <su...@yahoo.com> wrote:

In thinking a bit more about it, this should be a bug in Phoenix. This is because even with LIMIT clause I have a order by timestamp DESC, which means that column values MUST have been sorted prior to applying LIMIT clause. The LIMIT should then give a MAX value in such a case. Also, surprisingly this only seems to be failing in cases where there are huge number of records. Like, in my case I have 200K + records.
Any help will be appreciated.
Thanks,Sumit
      From: Sumit Nigam <su...@yahoo.com>
 To: Users Mail List Phoenix <us...@phoenix.apache.org> 
 Sent: Thursday, December 10, 2015 8:05 PM
 Subject: Help with LIMIT clause
  
Hi,
The link for salted tables https://phoenix.apache.org/salted.html mentions "Since salting table would not store the data sequentially, a strict sequential scan would not return all the data in the natural sorted fashion. Clauses that currently would force a sequential scan, for example, clauses with LIMIT, would likely to return items that are different from a normal table"
So, would a simple query such as SELECT CURRENT_TIMESTAMP FROM TBL ORDER BY CURRENT_TIMESTAMP DESC LIMIT 1; not really return the MAX(CURRENT_TIMESTAMP) ?
PK is on 2 columns with CURRENT_TIMESTAMP as the leading one. I am noticing this issue:
select current_timestamp from TBL order by current_timestamp desc limit 1;+------------------------------------------+|            CURRENT_TIMESTAMP             |+------------------------------------------+| 1448815328556                            |+------------------------------------------+

select max(current_timestamp) from TBL;+------------------------------------------+|         MAX("CURRENT_TIMESTAMP")         |+------------------------------------------+| 1449732792090                            |+------------------------------------------+
The results are different. MAX is of course, returning the right record.
If this is the case, then what should be done where LIMIT is really to be used? What can I replace it with to get the desired behavior?
Is this also correct that when there is a WHERE clause limiting the number of projected records, then LIMIT seems to work fine? I seem to be noticing that also.
This is with hbase 0.98.14 and phoenix 4.5.x
Thanks,Sumit

  



 

Re: Help with LIMIT clause

Posted by James Taylor <ja...@apache.org>.
Hi Sumit,
I agree, these two queries should return the same result, as long as you
have the ORDER BY clause. What version of Phoenix are you using? What does
your DDL look like? Please file a JIRA that ideally includes a way of
reproducing the issue.

select current_timestamp from TBL order by current_timestamp desc limit 1;
select max(current_timestamp) from TBL;

Thanks,
James


On Thu, Dec 10, 2015 at 8:58 AM, Sumit Nigam <su...@yahoo.com> wrote:

> In thinking a bit more about it, this should be a bug in Phoenix. This is
> because even with LIMIT clause I have a order by timestamp DESC, which
> means that column values MUST have been sorted prior to applying LIMIT
> clause. The LIMIT should then give a MAX value in such a case. Also,
> surprisingly this only seems to be failing in cases where there are huge
> number of records. Like, in my case I have 200K + records.
>
> Any help will be appreciated.
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Sumit Nigam <su...@yahoo.com>
> *To:* Users Mail List Phoenix <us...@phoenix.apache.org>
> *Sent:* Thursday, December 10, 2015 8:05 PM
> *Subject:* Help with LIMIT clause
>
> Hi,
>
> The link for salted tables https://phoenix.apache.org/salted.html mentions
> "*Since salting table would not store the data sequentially, a strict
> sequential scan would not return all the data in the natural sorted
> fashion. Clauses that currently would force a sequential scan, for example,
> clauses with LIMIT, would likely to return items that are different from a
> normal table*"
>
> So, would a simple query such as *SELECT CURRENT_TIMESTAMP FROM TBL ORDER
> BY CURRENT_TIMESTAMP DESC LIMIT 1*; not really return the *MAX*(
> *CURRENT_TIMESTAMP*) ?
>
> PK is on 2 columns with *CURRENT_TIMESTAMP *as the leading one. I am
> noticing this issue:
>
> *select current_timestamp from TBL order by current_timestamp desc limit
> 1;*
> +------------------------------------------+
> |            CURRENT_TIMESTAMP             |
> +------------------------------------------+
> | 1448815328556                            |
> +------------------------------------------+
>
>
> *select max(current_timestamp) from TBL;*
> +------------------------------------------+
> |         MAX("CURRENT_TIMESTAMP")         |
> +------------------------------------------+
> | 1449732792090                            |
> +------------------------------------------+
>
> The results are different. MAX is of course, returning the right record.
>
> If this is the case, then what should be done where LIMIT is really to be
> used? What can I replace it with to get the desired behavior?
>
> Is this also correct that when there is a WHERE clause limiting the number
> of projected records, then LIMIT seems to work fine? I seem to be noticing
> that also.
>
> This is with hbase 0.98.14 and phoenix 4.5.x
>
> Thanks,
> Sumit
>
>
>

Re: Help with LIMIT clause

Posted by Sumit Nigam <su...@yahoo.com>.
In thinking a bit more about it, this should be a bug in Phoenix. This is because even with LIMIT clause I have a order by timestamp DESC, which means that column values MUST have been sorted prior to applying LIMIT clause. The LIMIT should then give a MAX value in such a case. Also, surprisingly this only seems to be failing in cases where there are huge number of records. Like, in my case I have 200K + records.
Any help will be appreciated.
Thanks,Sumit
      From: Sumit Nigam <su...@yahoo.com>
 To: Users Mail List Phoenix <us...@phoenix.apache.org> 
 Sent: Thursday, December 10, 2015 8:05 PM
 Subject: Help with LIMIT clause
   
Hi,
The link for salted tables https://phoenix.apache.org/salted.html mentions "Since salting table would not store the data sequentially, a strict sequential scan would not return all the data in the natural sorted fashion. Clauses that currently would force a sequential scan, for example, clauses with LIMIT, would likely to return items that are different from a normal table"
So, would a simple query such as SELECT CURRENT_TIMESTAMP FROM TBL ORDER BY CURRENT_TIMESTAMP DESC LIMIT 1; not really return the MAX(CURRENT_TIMESTAMP) ?
PK is on 2 columns with CURRENT_TIMESTAMP as the leading one. I am noticing this issue:
select current_timestamp from TBL order by current_timestamp desc limit 1;+------------------------------------------+|            CURRENT_TIMESTAMP             |+------------------------------------------+| 1448815328556                            |+------------------------------------------+

select max(current_timestamp) from TBL;+------------------------------------------+|         MAX("CURRENT_TIMESTAMP")         |+------------------------------------------+| 1449732792090                            |+------------------------------------------+
The results are different. MAX is of course, returning the right record.
If this is the case, then what should be done where LIMIT is really to be used? What can I replace it with to get the desired behavior?
Is this also correct that when there is a WHERE clause limiting the number of projected records, then LIMIT seems to work fine? I seem to be noticing that also.
This is with hbase 0.98.14 and phoenix 4.5.x
Thanks,Sumit