You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Sachin Katakdound <sa...@gmail.com> on 2015/12/13 07:24:17 UTC

Order by/limit clause on partitioned data

Let’s say we have a simple (phoenix table) data set.

create table ADDRESS(“id” NUMBER primary key,
	“city” VARCHAR,
	"state” VARCHAR,
	“zip”	VARCHAR) SALT_BUCKETS=4

And lets suppose there are 4 different region servers, and this data set is spread across 4 separate regions.

Now, the following query is run;
select * from ADDRESS order by “city” limit 15;

When this simple query with Order by and limit clause is executed, does it return a valid data considering the fact that the data will be spread across 4 region servers?

I believe the query plan indicates “ordering on the server” and then limit clause on the client. 

Does this mean that 15 rows are gathered from each region server and then the limit clause applied on the client?
How exactly is this query executed?

Thanks in advance.

Regards,
Sachin

	
	


Re: Order by/limit clause on partitioned data

Posted by James Taylor <ja...@apache.org>.
No, no problem with that version, but the later the version the better as
we're always improving stability, and performance, and adding new features.
Current release is 4.6.0, so you're about six releases back if you include
patch releases.
Thanks,
James

On Fri, Dec 18, 2015 at 5:22 PM, Sachin Katakdound <
sachin.katakdound@gmail.com> wrote:

>
> Thanks for quick reply.
>
> We are using 4.2+ version at the moment, will that be a problem? Or is
> this a default behavior in all the version of Phoenix?
>
> Regards,
> Sachin
>
> On Dec 13, 2015, at 4:17 PM, James Taylor <ja...@apache.org> wrote:
>
> bq. When this simple query with Order by and limit clause is executed,
> does it return a valid data considering the fact that the data will be
> spread across 4 region servers?
> Yes
>
> bq. Does this mean that 15 rows are gathered from each region server and
> then the limit clause applied on the client?
> Yes
>
> bq. How exactly is this query executed?
> It's executed as you've described. Each parallel scan that Phoenix runs
> returns the top 15 rows (as determined by the ORDER BY clause). The results
> of these scans are merge sorted on the client with the top 15 rows being
> returned.
>
> On Sat, Dec 12, 2015 at 10:24 PM, Sachin Katakdound <
> sachin.katakdound@gmail.com> wrote:
>
>> Let’s say we have a simple (phoenix table) data set.
>>
>> create table ADDRESS(“id” NUMBER primary key,
>>         “city” VARCHAR,
>>         "state” VARCHAR,
>>         “zip”   VARCHAR) SALT_BUCKETS=4
>>
>> And lets suppose there are 4 different region servers, and this data set
>> is spread across 4 separate regions.
>>
>> Now, the following query is run;
>> select * from ADDRESS order by “city” limit 15;
>>
>> When this simple query with Order by and limit clause is executed, does
>> it return a valid data considering the fact that the data will be spread
>> across 4 region servers?
>>
>> I believe the query plan indicates “ordering on the server” and then
>> limit clause on the client.
>>
>> Does this mean that 15 rows are gathered from each region server and then
>> the limit clause applied on the client?
>> How exactly is this query executed?
>>
>> Thanks in advance.
>>
>> Regards,
>> Sachin
>>
>>
>>
>>
>>
>
>

Re: Order by/limit clause on partitioned data

Posted by Sachin Katakdound <sa...@gmail.com>.
Thanks for quick reply.

We are using 4.2+ version at the moment, will that be a problem? Or is this a default behavior in all the version of Phoenix? 

Regards,
Sachin

> On Dec 13, 2015, at 4:17 PM, James Taylor <ja...@apache.org> wrote:
> 
> bq. When this simple query with Order by and limit clause is executed, does it return a valid data considering the fact that the data will be spread across 4 region servers?
> Yes
> 
> bq. Does this mean that 15 rows are gathered from each region server and then the limit clause applied on the client?
> Yes
> 
> bq. How exactly is this query executed?
> It's executed as you've described. Each parallel scan that Phoenix runs returns the top 15 rows (as determined by the ORDER BY clause). The results of these scans are merge sorted on the client with the top 15 rows being returned.
> 
> On Sat, Dec 12, 2015 at 10:24 PM, Sachin Katakdound <sachin.katakdound@gmail.com <ma...@gmail.com>> wrote:
> Let’s say we have a simple (phoenix table) data set.
> 
> create table ADDRESS(“id” NUMBER primary key,
>         “city” VARCHAR,
>         "state” VARCHAR,
>         “zip”   VARCHAR) SALT_BUCKETS=4
> 
> And lets suppose there are 4 different region servers, and this data set is spread across 4 separate regions.
> 
> Now, the following query is run;
> select * from ADDRESS order by “city” limit 15;
> 
> When this simple query with Order by and limit clause is executed, does it return a valid data considering the fact that the data will be spread across 4 region servers?
> 
> I believe the query plan indicates “ordering on the server” and then limit clause on the client.
> 
> Does this mean that 15 rows are gathered from each region server and then the limit clause applied on the client?
> How exactly is this query executed?
> 
> Thanks in advance.
> 
> Regards,
> Sachin
> 
> 
> 
> 
> 


Re: Order by/limit clause on partitioned data

Posted by James Taylor <ja...@apache.org>.
bq. When this simple query with Order by and limit clause is executed, does
it return a valid data considering the fact that the data will be spread
across 4 region servers?
Yes

bq. Does this mean that 15 rows are gathered from each region server and
then the limit clause applied on the client?
Yes

bq. How exactly is this query executed?
It's executed as you've described. Each parallel scan that Phoenix runs
returns the top 15 rows (as determined by the ORDER BY clause). The results
of these scans are merge sorted on the client with the top 15 rows being
returned.

On Sat, Dec 12, 2015 at 10:24 PM, Sachin Katakdound <
sachin.katakdound@gmail.com> wrote:

> Let’s say we have a simple (phoenix table) data set.
>
> create table ADDRESS(“id” NUMBER primary key,
>         “city” VARCHAR,
>         "state” VARCHAR,
>         “zip”   VARCHAR) SALT_BUCKETS=4
>
> And lets suppose there are 4 different region servers, and this data set
> is spread across 4 separate regions.
>
> Now, the following query is run;
> select * from ADDRESS order by “city” limit 15;
>
> When this simple query with Order by and limit clause is executed, does it
> return a valid data considering the fact that the data will be spread
> across 4 region servers?
>
> I believe the query plan indicates “ordering on the server” and then limit
> clause on the client.
>
> Does this mean that 15 rows are gathered from each region server and then
> the limit clause applied on the client?
> How exactly is this query executed?
>
> Thanks in advance.
>
> Regards,
> Sachin
>
>
>
>
>