You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Flavio Pompermaier <po...@okkam.it> on 2017/12/20 16:58:36 UTC

Query optimization

Hi to all,
I'm trying to find the best query for my use case but I found that one
version work and the other one does not (unless that I don't apply some
tuning to timeouts etc like explained in [1]).

The 2 queries extract the same data but, while the first query terminates
the second does not.
*PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working

SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                     PLAN
                                    | EST_BYTES_READ  | EST_ROWS_READ  |
EST_INFO_TS   |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| SORT-MERGE-JOIN (INNER) TABLES
                                    | 14155777900     | 12077867       |
1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL
SCAN OVER PEOPLE                 | 14155777900     | 12077867       |
1513754378759  |
|         SERVER FILTER BY FIRST KEY ONLY
                                     | 14155777900     | 12077867       |
1513754378759  |
|     CLIENT MERGE SORT
                                     | 14155777900     | 12077867       |
1513754378759  |
| AND (SKIP MERGE)
                                    | 14155777900     | 12077867       |
1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
SCAN OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       |
1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
HAS_CANDIDATES = false)                   | 14155777900     | 12077867
 | 1513754378759  |
|         SERVER SORTED BY [L.LOCALID]
                                        | 14155777900     | 12077867
 | 1513754378759  |
|     CLIENT MERGE SORT
                                     | 14155777900     | 12077867       |
1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW
                                    | 14155777900     | 12077867       |
1513754378759  |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+


SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
FROM (SELECT LOCALID FROM MYTABLE
WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                     PLAN
                                   | EST_BYTES_READ  | EST_ROWS_READ  |
EST_INFO_TS   |
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| SORT-MERGE-JOIN (INNER) TABLES
                                   | 14155777900     | 12077867       |
1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
SCAN OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       |
1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
HAS_CANDIDATES = false)                  | 14155777900     | 12077867
 | 1513754378759  |
|     CLIENT MERGE SORT
                                    | 14155777900     | 12077867       |
1513754378759  |
| AND (SKIP MERGE)
                                   | 14155777900     | 12077867       |
1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL
SCAN OVER PEOPLE               | 14155777900     | 12077867       |
1513754378759  |
|         SERVER FILTER BY FIRST KEY ONLY
                                    | 14155777900     | 12077867       |
1513754378759  |
|         SERVER SORTED BY [DS.PERSON_ID]
                                  | 14155777900     | 12077867       |
1513754378759  |
|     CLIENT MERGE SORT
                                    | 14155777900     | 12077867       |
1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW
                                   | 14155777900     | 12077867       |
1513754378759  |
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
10 rows selected (0.041 seconds)

What do you think? Whould I try to give more resources to HBase/Phoenix or
is the first query the best (and reliable) one?
Any insight about this is highly appreciated..

Best,
Flavio

Re: Query optimization

Posted by Flavio Pompermaier <po...@okkam.it>.
Here it is: https://issues.apache.org/jira/browse/PHOENIX-4508

On Thu, Dec 28, 2017 at 9:19 AM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi James,
> What should be the subject of the JIRA?
> Could you open it for me...? I'm on vacation and opening tickets on JIRA
> from mobile is not that easy...
> Just 2 observations: PEOPLE table is indeed sorted by PEOPLE_ID, MY_TABLE
> is somewhat of a pivot table so it's MUCH bigger that PEOPLE in terms of
> cardinality, while that's probably not true in terms of cells and, thus,
> occupied space and memory.
>
> Best,
> Flavio
>
>
>
> On 28 Dec 2017 08:10, "James Taylor" <ja...@apache.org> wrote:
>
> Looks like the second query is sorting the entire PEOPLE table (though it
> seems like that shouldn’t be necessary as it’s probably already sorted by
> PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is
> likely less data). Might be a bug as the queries look the same.
>
> Please log a JIRA and thanks for all the details.
>
> On Wed, Dec 27, 2017 at 3:44 PM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Ok.  So why the 2nd query requires more memory than the first one
>> (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete?
>>
>>
>> On 28 Dec 2017 00:33, "James Taylor" <ja...@apache.org> wrote:
>>
>> A hash join (the default) will be faster but the tables being cached
>> (last or RHS table being joined) must be small enough to fit into memory on
>> the region server. If it's too big, you can use the USE_SORT_MERGE_JOIN
>> which would not have this restriction.
>>
>> On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Just to summarize things...is the best approach, in terms of required
>>> memory, for Apache Phoenix queries to use sort merge join? Should inner
>>> queries be avoided?
>>>
>>>
>>> On 22 Dec 2017 22:47, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>>
>>> MYTABLE is definitely much bigger than PEOPLE table, in terms of
>>> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
>>>
>>> On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote:
>>>
>>>> I see. I think client side probably hold on to the iterators from the
>>>> both sides and crawling forward to do the merge sort. in this case should
>>>> be no much memory footprint either way where the filter is performed.
>>>>
>>>> On December 22, 2017 at 1:04:18 PM, James Taylor (
>>>> jamestaylor@apache.org) wrote:
>>>>
>>>> There’s no shipping of any tables with a sort merge join.
>>>>
>>>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
>>>>
>>>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>>>>> around without get filtered first. Just for experiment, if you took out
>>>>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>>>>
>>>>>
>>>>> On December 22, 2017 at 12:46:25 PM, James Taylor (
>>>>> jamestaylor@apache.org) wrote:
>>>>>
>>>>> For sort merge join, both post-filtered table results are sorted on
>>>>> the server side and then a merge sort is done on the client-side.
>>>>>
>>>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>>>>>
>>>>>> Hello Flavio,
>>>>>>
>>>>>> From the plan looks like to me the second query is doing the filter
>>>>>> at parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>>>>> (after filtered) respectively?
>>>>>>
>>>>>> For sort merge join, anyone knows are the both sides get shipped to
>>>>>> client to do the merge sort?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>>>>> pompermaier@okkam.it) wrote:
>>>>>>
>>>>>> Any help here...?
>>>>>>
>>>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi to all,
>>>>>>> I'm trying to find the best query for my use case but I found that
>>>>>>> one version work and the other one does not (unless that I don't apply some
>>>>>>> tuning to timeouts etc like explained in [1]).
>>>>>>>
>>>>>>> The 2 queries extract the same data but, while the first query
>>>>>>> terminates the second does not.
>>>>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>>>>
>>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES =
>>>>>>> FALSE;
>>>>>>> +-----------------------------------------------------------
>>>>>>> ----------------------------------------------------+-------
>>>>>>> ----------+----------------+----------------+
>>>>>>> |                                                     PLAN
>>>>>>>                                             | EST_BYTES_READ  |
>>>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ----------------------------------------------------+-------
>>>>>>> ----------+----------------+----------------+
>>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>>                                             | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>>>>> FULL SCAN OVER PEOPLE                 | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> | AND (SKIP MERGE)
>>>>>>>                                             | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>>    | 12077867       | 1513754378759  |
>>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>>>>                                                 | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>>                                             | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ----------------------------------------------------+-------
>>>>>>> ----------+----------------+----------------+
>>>>>>>
>>>>>>>
>>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>>>>> +-----------------------------------------------------------
>>>>>>> ---------------------------------------------------+--------
>>>>>>> ---------+----------------+----------------+
>>>>>>> |                                                     PLAN
>>>>>>>                                            | EST_BYTES_READ  |
>>>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ---------------------------------------------------+--------
>>>>>>> ---------+----------------+----------------+
>>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>>    | 12077867       | 1513754378759  |
>>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                           | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> | AND (SKIP MERGE)
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>>>>    | 12077867       | 1513754378759  |
>>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>>                                           | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>>>>                                         | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                           | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ---------------------------------------------------+--------
>>>>>>> ---------+----------------+----------------+
>>>>>>> 10 rows selected (0.041 seconds)
>>>>>>>
>>>>>>> What do you think? Whould I try to give more resources to
>>>>>>> HBase/Phoenix or is the first query the best (and reliable) one?
>>>>>>> Any insight about this is highly appreciated..
>>>>>>>
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809

Re: Query optimization

Posted by Flavio Pompermaier <po...@okkam.it>.
Hi James,
What should be the subject of the JIRA?
Could you open it for me...? I'm on vacation and opening tickets on JIRA
from mobile is not that easy...
Just 2 observations: PEOPLE table is indeed sorted by PEOPLE_ID, MY_TABLE
is somewhat of a pivot table so it's MUCH bigger that PEOPLE in terms of
cardinality, while that's probably not true in terms of cells and, thus,
occupied space and memory.

Best,
Flavio



On 28 Dec 2017 08:10, "James Taylor" <ja...@apache.org> wrote:

Looks like the second query is sorting the entire PEOPLE table (though it
seems like that shouldn’t be necessary as it’s probably already sorted by
PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is
likely less data). Might be a bug as the queries look the same.

Please log a JIRA and thanks for all the details.

On Wed, Dec 27, 2017 at 3:44 PM Flavio Pompermaier <po...@okkam.it>
wrote:

> Ok.  So why the 2nd query requires more memory than the first one
> (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete?
>
>
> On 28 Dec 2017 00:33, "James Taylor" <ja...@apache.org> wrote:
>
> A hash join (the default) will be faster but the tables being cached (last
> or RHS table being joined) must be small enough to fit into memory on the
> region server. If it's too big, you can use the USE_SORT_MERGE_JOIN which
> would not have this restriction.
>
> On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Just to summarize things...is the best approach, in terms of required
>> memory, for Apache Phoenix queries to use sort merge join? Should inner
>> queries be avoided?
>>
>>
>> On 22 Dec 2017 22:47, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>
>> MYTABLE is definitely much bigger than PEOPLE table, in terms of
>> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
>>
>> On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote:
>>
>>> I see. I think client side probably hold on to the iterators from the
>>> both sides and crawling forward to do the merge sort. in this case should
>>> be no much memory footprint either way where the filter is performed.
>>>
>>> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
>>> wrote:
>>>
>>> There’s no shipping of any tables with a sort merge join.
>>>
>>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
>>>
>>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>>>> around without get filtered first. Just for experiment, if you took out
>>>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>>>
>>>>
>>>> On December 22, 2017 at 12:46:25 PM, James Taylor (
>>>> jamestaylor@apache.org) wrote:
>>>>
>>>> For sort merge join, both post-filtered table results are sorted on the
>>>> server side and then a merge sort is done on the client-side.
>>>>
>>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>>>>
>>>>> Hello Flavio,
>>>>>
>>>>> From the plan looks like to me the second query is doing the filter at
>>>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>>>> (after filtered) respectively?
>>>>>
>>>>> For sort merge join, anyone knows are the both sides get shipped to
>>>>> client to do the merge sort?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>>>> pompermaier@okkam.it) wrote:
>>>>>
>>>>> Any help here...?
>>>>>
>>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it>
>>>>> wrote:
>>>>>
>>>>>> Hi to all,
>>>>>> I'm trying to find the best query for my use case but I found that
>>>>>> one version work and the other one does not (unless that I don't apply some
>>>>>> tuning to timeouts etc like explained in [1]).
>>>>>>
>>>>>> The 2 queries extract the same data but, while the first query
>>>>>> terminates the second does not.
>>>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>>>
>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES =
>>>>>> FALSE;
>>>>>> +-----------------------------------------------------------
>>>>>> ----------------------------------------------------+-------
>>>>>> ----------+----------------+----------------+
>>>>>> |                                                     PLAN
>>>>>>                                           | EST_BYTES_READ  |
>>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>>> +-----------------------------------------------------------
>>>>>> ----------------------------------------------------+-------
>>>>>> ----------+----------------+----------------+
>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>>>> FULL SCAN OVER PEOPLE                 | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | AND (SKIP MERGE)
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>>>                                               | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> +-----------------------------------------------------------
>>>>>> ----------------------------------------------------+-------
>>>>>> ----------+----------------+----------------+
>>>>>>
>>>>>>
>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>>>> +-----------------------------------------------------------
>>>>>> ---------------------------------------------------+--------
>>>>>> ---------+----------------+----------------+
>>>>>> |                                                     PLAN
>>>>>>                                          | EST_BYTES_READ  | EST_ROWS_READ
>>>>>> |  EST_INFO_TS   |
>>>>>> +-----------------------------------------------------------
>>>>>> ---------------------------------------------------+--------
>>>>>> ---------+----------------+----------------+
>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | AND (SKIP MERGE)
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>>>                                         | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> +-----------------------------------------------------------
>>>>>> ---------------------------------------------------+--------
>>>>>> ---------+----------------+----------------+
>>>>>> 10 rows selected (0.041 seconds)
>>>>>>
>>>>>> What do you think? Whould I try to give more resources to
>>>>>> HBase/Phoenix or is the first query the best (and reliable) one?
>>>>>> Any insight about this is highly appreciated..
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>
>>>>
>>
>
>

Re: Query optimization

Posted by James Taylor <ja...@apache.org>.
Looks like the second query is sorting the entire PEOPLE table (though it
seems like that shouldn’t be necessary as it’s probably already sorted by
PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is
likely less data). Might be a bug as the queries look the same.

Please log a JIRA and thanks for all the details.

On Wed, Dec 27, 2017 at 3:44 PM Flavio Pompermaier <po...@okkam.it>
wrote:

> Ok.  So why the 2nd query requires more memory than the first one
> (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete?
>
>
> On 28 Dec 2017 00:33, "James Taylor" <ja...@apache.org> wrote:
>
> A hash join (the default) will be faster but the tables being cached (last
> or RHS table being joined) must be small enough to fit into memory on the
> region server. If it's too big, you can use the USE_SORT_MERGE_JOIN which
> would not have this restriction.
>
> On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Just to summarize things...is the best approach, in terms of required
>> memory, for Apache Phoenix queries to use sort merge join? Should inner
>> queries be avoided?
>>
>>
>> On 22 Dec 2017 22:47, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>
>> MYTABLE is definitely much bigger than PEOPLE table, in terms of
>> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
>>
>> On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote:
>>
>>> I see. I think client side probably hold on to the iterators from the
>>> both sides and crawling forward to do the merge sort. in this case should
>>> be no much memory footprint either way where the filter is performed.
>>>
>>> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
>>> wrote:
>>>
>>> There’s no shipping of any tables with a sort merge join.
>>>
>>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
>>>
>>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>>>> around without get filtered first. Just for experiment, if you took out
>>>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>>>
>>>>
>>>> On December 22, 2017 at 12:46:25 PM, James Taylor (
>>>> jamestaylor@apache.org) wrote:
>>>>
>>>> For sort merge join, both post-filtered table results are sorted on the
>>>> server side and then a merge sort is done on the client-side.
>>>>
>>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>>>>
>>>>> Hello Flavio,
>>>>>
>>>>> From the plan looks like to me the second query is doing the filter at
>>>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>>>> (after filtered) respectively?
>>>>>
>>>>> For sort merge join, anyone knows are the both sides get shipped to
>>>>> client to do the merge sort?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>>>> pompermaier@okkam.it) wrote:
>>>>>
>>>>> Any help here...?
>>>>>
>>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it>
>>>>> wrote:
>>>>>
>>>>>> Hi to all,
>>>>>> I'm trying to find the best query for my use case but I found that
>>>>>> one version work and the other one does not (unless that I don't apply some
>>>>>> tuning to timeouts etc like explained in [1]).
>>>>>>
>>>>>> The 2 queries extract the same data but, while the first query
>>>>>> terminates the second does not.
>>>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>>>
>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES =
>>>>>> FALSE;
>>>>>>
>>>>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> |                                                     PLAN
>>>>>>                                           | EST_BYTES_READ  |
>>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>>>
>>>>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>>>> FULL SCAN OVER PEOPLE                 | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | AND (SKIP MERGE)
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>>>                                               | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>
>>>>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>>
>>>>>>
>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>>>>
>>>>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> |                                                     PLAN
>>>>>>                                          | EST_BYTES_READ  | EST_ROWS_READ
>>>>>> |  EST_INFO_TS   |
>>>>>>
>>>>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | AND (SKIP MERGE)
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>>>                                         | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>
>>>>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> 10 rows selected (0.041 seconds)
>>>>>>
>>>>>> What do you think? Whould I try to give more resources to
>>>>>> HBase/Phoenix or is the first query the best (and reliable) one?
>>>>>> Any insight about this is highly appreciated..
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>
>>>>
>>
>
>

Re: Query optimization

Posted by Flavio Pompermaier <po...@okkam.it>.
Ok.  So why the 2nd query requires more memory than the first one
(nonetheless USE_SORT_MERGE_JOIN is used) and can't complete?

On 28 Dec 2017 00:33, "James Taylor" <ja...@apache.org> wrote:

A hash join (the default) will be faster but the tables being cached (last
or RHS table being joined) must be small enough to fit into memory on the
region server. If it's too big, you can use the USE_SORT_MERGE_JOIN which
would not have this restriction.

On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Just to summarize things...is the best approach, in terms of required
> memory, for Apache Phoenix queries to use sort merge join? Should inner
> queries be avoided?
>
>
> On 22 Dec 2017 22:47, "Flavio Pompermaier" <po...@okkam.it> wrote:
>
> MYTABLE is definitely much bigger than PEOPLE table, in terms of
> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
>
> On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote:
>
>> I see. I think client side probably hold on to the iterators from the
>> both sides and crawling forward to do the merge sort. in this case should
>> be no much memory footprint either way where the filter is performed.
>>
>> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
>> wrote:
>>
>> There’s no shipping of any tables with a sort merge join.
>>
>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
>>
>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>>> around without get filtered first. Just for experiment, if you took out
>>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>>
>>>
>>> On December 22, 2017 at 12:46:25 PM, James Taylor (
>>> jamestaylor@apache.org) wrote:
>>>
>>> For sort merge join, both post-filtered table results are sorted on the
>>> server side and then a merge sort is done on the client-side.
>>>
>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>>>
>>>> Hello Flavio,
>>>>
>>>> From the plan looks like to me the second query is doing the filter at
>>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>>> (after filtered) respectively?
>>>>
>>>> For sort merge join, anyone knows are the both sides get shipped to
>>>> client to do the merge sort?
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>>> pompermaier@okkam.it) wrote:
>>>>
>>>> Any help here...?
>>>>
>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it>
>>>> wrote:
>>>>
>>>>> Hi to all,
>>>>> I'm trying to find the best query for my use case but I found that one
>>>>> version work and the other one does not (unless that I don't apply some
>>>>> tuning to timeouts etc like explained in [1]).
>>>>>
>>>>> The 2 queries extract the same data but, while the first query
>>>>> terminates the second does not.
>>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>>
>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>>> +-----------------------------------------------------------
>>>>> ----------------------------------------------------+-------
>>>>> ----------+----------------+----------------+
>>>>> |                                                     PLAN
>>>>>                                           | EST_BYTES_READ  |
>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>> +-----------------------------------------------------------
>>>>> ----------------------------------------------------+-------
>>>>> ----------+----------------+----------------+
>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>                                           | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>>> FULL SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | AND (SKIP MERGE)
>>>>>                                           | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>>                                               | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>                                           | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> +-----------------------------------------------------------
>>>>> ----------------------------------------------------+-------
>>>>> ----------+----------------+----------------+
>>>>>
>>>>>
>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>>> +-----------------------------------------------------------
>>>>> ---------------------------------------------------+--------
>>>>> ---------+----------------+----------------+
>>>>> |                                                     PLAN
>>>>>                                          | EST_BYTES_READ  | EST_ROWS_READ
>>>>> |  EST_INFO_TS   |
>>>>> +-----------------------------------------------------------
>>>>> ---------------------------------------------------+--------
>>>>> ---------+----------------+----------------+
>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                         | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | AND (SKIP MERGE)
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>                                         | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>>                                       | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                         | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> +-----------------------------------------------------------
>>>>> ---------------------------------------------------+--------
>>>>> ---------+----------------+----------------+
>>>>> 10 rows selected (0.041 seconds)
>>>>>
>>>>> What do you think? Whould I try to give more resources to
>>>>> HBase/Phoenix or is the first query the best (and reliable) one?
>>>>> Any insight about this is highly appreciated..
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>
>>>
>

Re: Query optimization

Posted by James Taylor <ja...@apache.org>.
A hash join (the default) will be faster but the tables being cached (last
or RHS table being joined) must be small enough to fit into memory on the
region server. If it's too big, you can use the USE_SORT_MERGE_JOIN which
would not have this restriction.

On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Just to summarize things...is the best approach, in terms of required
> memory, for Apache Phoenix queries to use sort merge join? Should inner
> queries be avoided?
>
>
> On 22 Dec 2017 22:47, "Flavio Pompermaier" <po...@okkam.it> wrote:
>
> MYTABLE is definitely much bigger than PEOPLE table, in terms of
> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
>
> On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote:
>
>> I see. I think client side probably hold on to the iterators from the
>> both sides and crawling forward to do the merge sort. in this case should
>> be no much memory footprint either way where the filter is performed.
>>
>> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
>> wrote:
>>
>> There’s no shipping of any tables with a sort merge join.
>>
>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
>>
>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>>> around without get filtered first. Just for experiment, if you took out
>>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>>
>>>
>>> On December 22, 2017 at 12:46:25 PM, James Taylor (
>>> jamestaylor@apache.org) wrote:
>>>
>>> For sort merge join, both post-filtered table results are sorted on the
>>> server side and then a merge sort is done on the client-side.
>>>
>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>>>
>>>> Hello Flavio,
>>>>
>>>> From the plan looks like to me the second query is doing the filter at
>>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>>> (after filtered) respectively?
>>>>
>>>> For sort merge join, anyone knows are the both sides get shipped to
>>>> client to do the merge sort?
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>>> pompermaier@okkam.it) wrote:
>>>>
>>>> Any help here...?
>>>>
>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it>
>>>> wrote:
>>>>
>>>>> Hi to all,
>>>>> I'm trying to find the best query for my use case but I found that one
>>>>> version work and the other one does not (unless that I don't apply some
>>>>> tuning to timeouts etc like explained in [1]).
>>>>>
>>>>> The 2 queries extract the same data but, while the first query
>>>>> terminates the second does not.
>>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>>
>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>>> +-----------------------------------------------------------
>>>>> ----------------------------------------------------+-------
>>>>> ----------+----------------+----------------+
>>>>> |                                                     PLAN
>>>>>                                           | EST_BYTES_READ  |
>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>> +-----------------------------------------------------------
>>>>> ----------------------------------------------------+-------
>>>>> ----------+----------------+----------------+
>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>                                           | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>>> FULL SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | AND (SKIP MERGE)
>>>>>                                           | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>>                                               | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>                                           | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> +-----------------------------------------------------------
>>>>> ----------------------------------------------------+-------
>>>>> ----------+----------------+----------------+
>>>>>
>>>>>
>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>>> +-----------------------------------------------------------
>>>>> ---------------------------------------------------+--------
>>>>> ---------+----------------+----------------+
>>>>> |                                                     PLAN
>>>>>                                          | EST_BYTES_READ  | EST_ROWS_READ
>>>>> |  EST_INFO_TS   |
>>>>> +-----------------------------------------------------------
>>>>> ---------------------------------------------------+--------
>>>>> ---------+----------------+----------------+
>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                         | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | AND (SKIP MERGE)
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>                                         | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>>                                       | 14155777900 <(415)%20577-7900>
>>>>>    | 12077867       | 1513754378759  |
>>>>> |     CLIENT MERGE SORT
>>>>>                                         | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>                                          | 14155777900
>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>> +-----------------------------------------------------------
>>>>> ---------------------------------------------------+--------
>>>>> ---------+----------------+----------------+
>>>>> 10 rows selected (0.041 seconds)
>>>>>
>>>>> What do you think? Whould I try to give more resources to
>>>>> HBase/Phoenix or is the first query the best (and reliable) one?
>>>>> Any insight about this is highly appreciated..
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>
>>>
>

Re: Query optimization

Posted by Flavio Pompermaier <po...@okkam.it>.
Just to summarize things...is the best approach, in terms of required
memory, for Apache Phoenix queries to use sort merge join? Should inner
queries be avoided?

On 22 Dec 2017 22:47, "Flavio Pompermaier" <po...@okkam.it> wrote:

MYTABLE is definitely much bigger than PEOPLE table, in terms of
cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger

On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote:

> I see. I think client side probably hold on to the iterators from the both
> sides and crawling forward to do the merge sort. in this case should be no
> much memory footprint either way where the filter is performed.
>
> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
> wrote:
>
> There’s no shipping of any tables with a sort merge join.
>
> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
>
>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>> around without get filtered first. Just for experiment, if you took out
>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>
>>
>> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestaylor@apache.org)
>> wrote:
>>
>> For sort merge join, both post-filtered table results are sorted on the
>> server side and then a merge sort is done on the client-side.
>>
>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>>
>>> Hello Flavio,
>>>
>>> From the plan looks like to me the second query is doing the filter at
>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>> (after filtered) respectively?
>>>
>>> For sort merge join, anyone knows are the both sides get shipped to
>>> client to do the merge sort?
>>>
>>> Thanks,
>>>
>>>
>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>> pompermaier@okkam.it) wrote:
>>>
>>> Any help here...?
>>>
>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>>
>>>> Hi to all,
>>>> I'm trying to find the best query for my use case but I found that one
>>>> version work and the other one does not (unless that I don't apply some
>>>> tuning to timeouts etc like explained in [1]).
>>>>
>>>> The 2 queries extract the same data but, while the first query
>>>> terminates the second does not.
>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>
>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>> |                                                     PLAN
>>>>                                         | EST_BYTES_READ  | EST_ROWS_READ
>>>> |  EST_INFO_TS   |
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>> FULL SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | AND (SKIP MERGE)
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>  | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>                                             | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>>
>>>>
>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> |                                                     PLAN
>>>>                                        | EST_BYTES_READ  | EST_ROWS_READ
>>>> |  EST_INFO_TS   |
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>  | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | AND (SKIP MERGE)
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>                                       | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> 10 rows selected (0.041 seconds)
>>>>
>>>> What do you think? Whould I try to give more resources to HBase/Phoenix
>>>> or is the first query the best (and reliable) one?
>>>> Any insight about this is highly appreciated..
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>

Re: Query optimization

Posted by Flavio Pompermaier <po...@okkam.it>.
MYTABLE is definitely much bigger than PEOPLE table, in terms of
cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger

On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote:

> I see. I think client side probably hold on to the iterators from the both
> sides and crawling forward to do the merge sort. in this case should be no
> much memory footprint either way where the filter is performed.
>
> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
> wrote:
>
> There’s no shipping of any tables with a sort merge join.
>
> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
>
>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>> around without get filtered first. Just for experiment, if you took out
>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>
>>
>> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestaylor@apache.org)
>> wrote:
>>
>> For sort merge join, both post-filtered table results are sorted on the
>> server side and then a merge sort is done on the client-side.
>>
>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>>
>>> Hello Flavio,
>>>
>>> From the plan looks like to me the second query is doing the filter at
>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>> (after filtered) respectively?
>>>
>>> For sort merge join, anyone knows are the both sides get shipped to
>>> client to do the merge sort?
>>>
>>> Thanks,
>>>
>>>
>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>> pompermaier@okkam.it) wrote:
>>>
>>> Any help here...?
>>>
>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>>
>>>> Hi to all,
>>>> I'm trying to find the best query for my use case but I found that one
>>>> version work and the other one does not (unless that I don't apply some
>>>> tuning to timeouts etc like explained in [1]).
>>>>
>>>> The 2 queries extract the same data but, while the first query
>>>> terminates the second does not.
>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>
>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>> |                                                     PLAN
>>>>                                         | EST_BYTES_READ  | EST_ROWS_READ
>>>> |  EST_INFO_TS   |
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>> FULL SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | AND (SKIP MERGE)
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>  | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>                                             | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>>
>>>>
>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> |                                                     PLAN
>>>>                                        | EST_BYTES_READ  | EST_ROWS_READ
>>>> |  EST_INFO_TS   |
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>  | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | AND (SKIP MERGE)
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>                                       | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> 10 rows selected (0.041 seconds)
>>>>
>>>> What do you think? Whould I try to give more resources to HBase/Phoenix
>>>> or is the first query the best (and reliable) one?
>>>> Any insight about this is highly appreciated..
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>

Re: Query optimization

Posted by Ethan <ew...@apache.org>.
I see. I think client side probably hold on to the iterators from the both sides and crawling forward to do the merge sort. in this case should be no much memory footprint either way where the filter is performed. 
On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org) wrote:

There’s no shipping of any tables with a sort merge join.

On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:
I see. Looks like it's possible the rhs (MYTABLE) is too big to ship around without get filtered first. Just for experiment, if you took out hint USE_SORT_MERGE_JOIN, what will be the plan?


On December 22, 2017 at 12:46:25 PM, James Taylor (jamestaylor@apache.org) wrote:

For sort merge join, both post-filtered table results are sorted on the server side and then a merge sort is done on the client-side.

On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
Hello Flavio,

From the plan looks like to me the second query is doing the filter at parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE (after filtered) respectively?

For sort merge join, anyone knows are the both sides get shipped to client to do the merge sort?

Thanks,


On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (pompermaier@okkam.it) wrote:

Any help here...?

On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:
Hi to all,
I'm trying to find the best query for my use case but I found that one version work and the other one does not (unless that I don't apply some tuning to timeouts etc like explained in [1]).

The 2 queries extract the same data but, while the first query terminates the second does not.
PS:  without the USE_SORT_MERGE_JOIN both queries weren't working

SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                     PLAN                                                      | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| SORT-MERGE-JOIN (INNER) TABLES                                                                                | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL SCAN OVER PEOPLE                 | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY FIRST KEY ONLY                                                                       | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                         | 14155777900     | 12077867       | 1513754378759  |
| AND (SKIP MERGE)                                                                                              | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = false)                   | 14155777900     | 12077867       | 1513754378759  |
|         SERVER SORTED BY [L.LOCALID]                                                                              | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                         | 14155777900     | 12077867       | 1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW                                                                              | 14155777900     | 12077867       | 1513754378759  |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+


SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM (SELECT LOCALID FROM MYTABLE
WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                     PLAN                                                     | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| SORT-MERGE-JOIN (INNER) TABLES                                                                               | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = false)                  | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                        | 14155777900     | 12077867       | 1513754378759  |
| AND (SKIP MERGE)                                                                                             | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL SCAN OVER PEOPLE               | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY FIRST KEY ONLY                                                                      | 14155777900     | 12077867       | 1513754378759  |
|         SERVER SORTED BY [DS.PERSON_ID]                                                                    | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                        | 14155777900     | 12077867       | 1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW                                                                             | 14155777900     | 12077867       | 1513754378759  |
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
10 rows selected (0.041 seconds)

What do you think? Whould I try to give more resources to HBase/Phoenix or is the first query the best (and reliable) one?
Any insight about this is highly appreciated..

Best,
Flavio


Re: Query optimization

Posted by James Taylor <ja...@apache.org>.
There’s no shipping of any tables with a sort merge join.

On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <ae...@gmail.com> wrote:

> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
> around without get filtered first. Just for experiment, if you took out
> hint USE_SORT_MERGE_JOIN, what will be the plan?
>
>
> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestaylor@apache.org)
> wrote:
>
> For sort merge join, both post-filtered table results are sorted on the
> server side and then a merge sort is done on the client-side.
>
> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:
>
>> Hello Flavio,
>>
>> From the plan looks like to me the second query is doing the filter at
>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>> (after filtered) respectively?
>>
>> For sort merge join, anyone knows are the both sides get shipped to
>> client to do the merge sort?
>>
>> Thanks,
>>
>>
>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>> pompermaier@okkam.it) wrote:
>>
>> Any help here...?
>>
>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>
>>> Hi to all,
>>> I'm trying to find the best query for my use case but I found that one
>>> version work and the other one does not (unless that I don't apply some
>>> tuning to timeouts etc like explained in [1]).
>>>
>>> The 2 queries extract the same data but, while the first query
>>> terminates the second does not.
>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>
>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>
>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> |                                                     PLAN
>>>                                         | EST_BYTES_READ  | EST_ROWS_READ
>>> |  EST_INFO_TS   |
>>>
>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>                                         | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL
>>> SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>     |
>>> 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | AND (SKIP MERGE)
>>>                                         | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>  | 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>> HAS_CANDIDATES = false)                   | 14155777900
>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>> |         SERVER SORTED BY [L.LOCALID]
>>>                                             | 14155777900
>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>                                         | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>>
>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>
>>>
>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>> FROM (SELECT LOCALID FROM MYTABLE
>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>
>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> |                                                     PLAN
>>>                                        | EST_BYTES_READ  | EST_ROWS_READ
>>> |  EST_INFO_TS   |
>>>
>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
>>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>>> 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>> HAS_CANDIDATES = false)                  | 14155777900
>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                       | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | AND (SKIP MERGE)
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>  | 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>                                       | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>                                     | 14155777900 <(415)%20577-7900>
>>>  | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                       | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>>
>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> 10 rows selected (0.041 seconds)
>>>
>>> What do you think? Whould I try to give more resources to HBase/Phoenix
>>> or is the first query the best (and reliable) one?
>>> Any insight about this is highly appreciated..
>>>
>>> Best,
>>> Flavio
>>>
>>
>

Re: Query optimization

Posted by Ethan Wang <ae...@gmail.com>.
I see. Looks like it's possible the rhs (MYTABLE) is too big to ship around
without get filtered first. Just for experiment, if you took out hint
USE_SORT_MERGE_JOIN, what will be the plan?


On December 22, 2017 at 12:46:25 PM, James Taylor (jamestaylor@apache.org)
wrote:

For sort merge join, both post-filtered table results are sorted on the
server side and then a merge sort is done on the client-side.

On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:

> Hello Flavio,
>
> From the plan looks like to me the second query is doing the filter at
> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
> (after filtered) respectively?
>
> For sort merge join, anyone knows are the both sides get shipped to client
> to do the merge sort?
>
> Thanks,
>
>
> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
> pompermaier@okkam.it) wrote:
>
> Any help here...?
>
> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:
>
>> Hi to all,
>> I'm trying to find the best query for my use case but I found that one
>> version work and the other one does not (unless that I don't apply some
>> tuning to timeouts etc like explained in [1]).
>>
>> The 2 queries extract the same data but, while the first query terminates
>> the second does not.
>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>> |                                                     PLAN
>>                                       | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL
>> SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | AND (SKIP MERGE)
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)                   | 14155777900
>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>> |         SERVER SORTED BY [L.LOCALID]
>>                                           | 14155777900
>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>>
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM (SELECT LOCALID FROM MYTABLE
>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> |                                                     PLAN
>>                                      | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)                  | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | AND (SKIP MERGE)
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL
>> SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |         SERVER SORTED BY [DS.PERSON_ID]
>>                                     | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> 10 rows selected (0.041 seconds)
>>
>> What do you think? Whould I try to give more resources to HBase/Phoenix
>> or is the first query the best (and reliable) one?
>> Any insight about this is highly appreciated..
>>
>> Best,
>> Flavio
>>
>

Re: Query optimization

Posted by James Taylor <ja...@apache.org>.
For sort merge join, both post-filtered table results are sorted on the
server side and then a merge sort is done on the client-side.

On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:

> Hello Flavio,
>
> From the plan looks like to me the second query is doing the filter at
> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
> (after filtered) respectively?
>
> For sort merge join, anyone knows are the both sides get shipped to client
> to do the merge sort?
>
> Thanks,
>
>
> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
> pompermaier@okkam.it) wrote:
>
> Any help here...?
>
> On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:
>
>> Hi to all,
>> I'm trying to find the best query for my use case but I found that one
>> version work and the other one does not (unless that I don't apply some
>> tuning to timeouts etc like explained in [1]).
>>
>> The 2 queries extract the same data but, while the first query terminates
>> the second does not.
>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>> |                                                     PLAN
>>                                       | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL
>> SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | AND (SKIP MERGE)
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)                   | 14155777900
>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>> |         SERVER SORTED BY [L.LOCALID]
>>                                           | 14155777900
>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>>
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM (SELECT LOCALID FROM MYTABLE
>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> |                                                     PLAN
>>                                      | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)                  | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | AND (SKIP MERGE)
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL
>> SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |         SERVER SORTED BY [DS.PERSON_ID]
>>                                     | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> 10 rows selected (0.041 seconds)
>>
>> What do you think? Whould I try to give more resources to HBase/Phoenix
>> or is the first query the best (and reliable) one?
>> Any insight about this is highly appreciated..
>>
>> Best,
>> Flavio
>>
>

Re: Query optimization

Posted by Ethan <ew...@apache.org>.
Hello Flavio,

From the plan looks like to me the second query is doing the filter at parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE (after filtered) respectively?

For sort merge join, anyone knows are the both sides get shipped to client to do the merge sort?

Thanks,


On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (pompermaier@okkam.it) wrote:

Any help here...?

On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:
Hi to all,
I'm trying to find the best query for my use case but I found that one version work and the other one does not (unless that I don't apply some tuning to timeouts etc like explained in [1]).

The 2 queries extract the same data but, while the first query terminates the second does not.
PS:  without the USE_SORT_MERGE_JOIN both queries weren't working

SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                     PLAN                                                      | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| SORT-MERGE-JOIN (INNER) TABLES                                                                                | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL SCAN OVER PEOPLE                 | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY FIRST KEY ONLY                                                                       | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                         | 14155777900     | 12077867       | 1513754378759  |
| AND (SKIP MERGE)                                                                                              | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = false)                   | 14155777900     | 12077867       | 1513754378759  |
|         SERVER SORTED BY [L.LOCALID]                                                                              | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                         | 14155777900     | 12077867       | 1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW                                                                              | 14155777900     | 12077867       | 1513754378759  |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+


SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM (SELECT LOCALID FROM MYTABLE
WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                     PLAN                                                     | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| SORT-MERGE-JOIN (INNER) TABLES                                                                               | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = false)                  | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                        | 14155777900     | 12077867       | 1513754378759  |
| AND (SKIP MERGE)                                                                                             | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL SCAN OVER PEOPLE               | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY FIRST KEY ONLY                                                                      | 14155777900     | 12077867       | 1513754378759  |
|         SERVER SORTED BY [DS.PERSON_ID]                                                                    | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                                                        | 14155777900     | 12077867       | 1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW                                                                             | 14155777900     | 12077867       | 1513754378759  |
+--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
10 rows selected (0.041 seconds)

What do you think? Whould I try to give more resources to HBase/Phoenix or is the first query the best (and reliable) one?
Any insight about this is highly appreciated..

Best,
Flavio

Re: Query optimization

Posted by Flavio Pompermaier <po...@okkam.it>.
Any help here...?

On 20 Dec 2017 17:58, "Flavio Pompermaier" <po...@okkam.it> wrote:

> Hi to all,
> I'm trying to find the best query for my use case but I found that one
> version work and the other one does not (unless that I don't apply some
> tuning to timeouts etc like explained in [1]).
>
> The 2 queries extract the same data but, while the first query terminates
> the second does not.
> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>
> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
> +-----------------------------------------------------------
> ----------------------------------------------------+-------
> ----------+----------------+----------------+
> |                                                     PLAN
>                                       | EST_BYTES_READ  | EST_ROWS_READ  |
> EST_INFO_TS   |
> +-----------------------------------------------------------
> ----------------------------------------------------+-------
> ----------+----------------+----------------+
> | SORT-MERGE-JOIN (INNER) TABLES
>                                       | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL
> SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>     |
> 12077867       | 1513754378759  |
> |         SERVER FILTER BY FIRST KEY ONLY
>                                      | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> |     CLIENT MERGE SORT
>                                      | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> | AND (SKIP MERGE)
>                                       | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
> 12077867       | 1513754378759  |
> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
> HAS_CANDIDATES = false)                   | 14155777900 <(415)%20577-7900>
>    | 12077867       | 1513754378759  |
> |         SERVER SORTED BY [L.LOCALID]
>                                           | 14155777900 <(415)%20577-7900>
>    | 12077867       | 1513754378759  |
> |     CLIENT MERGE SORT
>                                      | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> | CLIENT AGGREGATE INTO SINGLE ROW
>                                       | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> +-----------------------------------------------------------
> ----------------------------------------------------+-------
> ----------+----------------+----------------+
>
>
> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
> FROM (SELECT LOCALID FROM MYTABLE
> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
> +-----------------------------------------------------------
> ---------------------------------------------------+--------
> ---------+----------------+----------------+
> |                                                     PLAN
>                                      | EST_BYTES_READ  | EST_ROWS_READ  |
> EST_INFO_TS   |
> +-----------------------------------------------------------
> ---------------------------------------------------+--------
> ---------+----------------+----------------+
> | SORT-MERGE-JOIN (INNER) TABLES
>                                      | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
> 12077867       | 1513754378759  |
> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
> HAS_CANDIDATES = false)                  | 14155777900 <(415)%20577-7900>
>    | 12077867       | 1513754378759  |
> |     CLIENT MERGE SORT
>                                     | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> | AND (SKIP MERGE)
>                                      | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL
> SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>     |
> 12077867       | 1513754378759  |
> |         SERVER FILTER BY FIRST KEY ONLY
>                                     | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> |         SERVER SORTED BY [DS.PERSON_ID]
>                                   | 14155777900 <(415)%20577-7900>     |
> 12077867       | 1513754378759  |
> |     CLIENT MERGE SORT
>                                     | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> | CLIENT AGGREGATE INTO SINGLE ROW
>                                      | 14155777900 <(415)%20577-7900>
>  | 12077867       | 1513754378759  |
> +-----------------------------------------------------------
> ---------------------------------------------------+--------
> ---------+----------------+----------------+
> 10 rows selected (0.041 seconds)
>
> What do you think? Whould I try to give more resources to HBase/Phoenix or
> is the first query the best (and reliable) one?
> Any insight about this is highly appreciated..
>
> Best,
> Flavio
>