You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "chenglei (Jira)" <ji...@apache.org> on 2022/09/21 12:55:00 UTC

[jira] [Commented] (PHOENIX-6791) WHERE optimizer redesign

    [ https://issues.apache.org/jira/browse/PHOENIX-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607734#comment-17607734 ] 

chenglei commented on PHOENIX-6791:
-----------------------------------

[~kadir], I very agree that we should refactor the {{WHERE optimizer}}, especially after PHOENIX-3383, which is  hard to understand and maintain. But I have a confusion about your description:
??For example, consider a row key composed of three integer columns, PK1, PK2, and PK3, and the expression (PK1,  PK2) > (100, 25) AND PK3 = 5. The result would be a very large number of key slots and each key slot represents a point in the three dimensional space, including (100, 26, 5), (100, 27, 5), …, (100, 2147483647, 5), (101, 1, 5), (101, 2, 5), … .??

How the result could be (100, 26, 5), (100, 27, 5), …, (100, 2147483647, 5), (101, 1, 5), (101, 2, 5), … ? I have write a test to verify it the startRowKey in scan is (100,25,5) and the filter is just pk3 =5 .

> WHERE optimizer redesign
> ------------------------
>
>                 Key: PHOENIX-6791
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6791
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Priority: Major
>             Fix For: 5.3.0
>
>
> The WHERE optimizer in Phoenix derives the information about which row key ranges to be scanned from the primary key (PK) column expressions in a where clause. These key ranges are then used to determine the table regions to scan and generate a SkipScanFilter for each of these scans if applicable. 
> The WHERE expression may include non-PK column (sub) expressions. After identifying the key ranges, the WHERE optimizer removes the nodes for PK columns from the expression tree if these nodes are fully used to determine the key ranges.
> Since the values in the WHERE expression are expressed by byte arrays, the key ranges are also expressed using byte arrays. KeyRange represents a range for a row key or any sub part of a row key key. A key range is composed of two pairs, one for each end of the range, lower and upper. The pair is formed from a byte array and a boolean value. The boolean value indicates if the end of the range specified by the byte array is inclusive or not. If the byte array is empty, it means that the corresponding end of the range is unbounded. 
> KeySlot represents a key part and the list of key ranges for this key part where a key part can be any sub part of a PK, including leading, trailing, or middle part of the key. The number of columns in a key part is called span. For the terminal nodes (i..e, constant values) in the expression tree, KeySlot objects are created with a single key range. When KeySlot objects are rolled up in the expression tree, they can have multiple ranges. For example, a KeySlot object representing an IN expression will have a separate range for each member of the IN expression. Similarly the KeySlot object for an OR expression can have multiple ranges similarly. Please note an IN operator can be replaced by an equivalent OR expression. 
> When the WHERE optimizer visits the nodes of the expression tree, it generates a KeySlots object. KeySlots is essentially a list of KeySlot objects (please note the difference between KeySlots vs KeySlot). There are two types of KeySlots: SingleKeySlot and MultiKeySlot. SingleKeySlot represents a single key slot whereas MultiKeySlot is a list of key slots the results of AND expression on SingleKeySlot or MultiKeySlot objects. 
> The key slots are rolled into a MultiKeySlot object when processing an AND expression. The AND operation on two key slots starting their spans with the same PK columns is equivalent to taking intersection of their ranges. The OR operation implementation is limited and rather simple compared to the AND operation. The OR operation attempts to coalesce key slots if all of the key slots have the same starting PK column. If not, it generates a null KeySlots. When an expression node is used fully in generating a key slot, this expression node is removed from the expression tree.
> A row key for a given table can be composed of several PK columns. Without any restrictions imposed by predefined rules, intersection of key slots can lead to a large number of key slots, i.e., key ranges.  For example, consider a row key composed of three integer columns, PK1, PK2, and PK3, and the expression (PK1,  PK2) > (100, 25) AND PK3 = 5. The result would be a very large number of key slots and each key slot represents a point in the three dimensional space, including (100, 26, 5), (100, 27, 5), …, (100, 2147483647, 5), (101, 1, 5), (101, 2, 5), … .
> A simple expression (like the one given above) with a relatively small number of PK columns and a simple data type, e.g., integer, is sufficient to show that finding key ranges for an arbitrary expression is an intractable problem. Attempting to optimize the queries by enumerating the key ranges can lead to excessive memory allocation and long computation times and the optimization can defeat its purpose. 
> The current implementation attempts to enumerate all possible key ranges in general. Because of this, the WHERE optimizer has caused out of memory issues, and query timeouts due to high CPU usage. The very recent bug fixes attempts to catch these cases and prevent them. However, these fixes do not attempt to cover all cases and are formulated based on known cases.
> In addition to inefficient resource utilization, there are known types of expressions, the current implementation still returns wrong results for them.  For example, please see PHOENIX-6669 where if degenerate queries are caused by some conditions on non-leading PK columns, then Phoenix cannot catch this and can return wrong results.
> An example to show inconsistencies in the implementation is as follows. An RVC expression can be converted to an equivalent AND/OR expression. For example, (PK1, PK2) > (A, B) is equivalent to (PK1 > A) OR (PK1 = A AND PK2 > B). The implementation converts the first expression into a single key range and thus a scan with the start and stop rows keys without a filter. However, the implementation cannot do the same for the second expression and instead it generates a scan with a filter for the expression without generating a key range.
> Due to tens of possibly conflicting bug fixes over the years and not having a document that clearly describes the design, the current implementation has become hard to understand and maintain. 
> The WHERE optimizer redesign will be formulated based on the following observations:
>  # As described in the previous section, attempting to enumerate the PK ranges over arbitrary expression is an intractable problem due to key range explosion. Since identifying key ranges is just for the optimization but not for the correctness of queries, the cost of optimization should justify the gain from the optimization.
>  # The optimization gain comes from first skipping table regions and then skipping rows within table regions. In practice, the most gain comes from the most significant leading PK columns. The optimization is not useful if the first leading PK column is not included in a WHERE expression. The value of the optimization decreases with the subsequent PK columns. 
> The objectives of the redesign are as follows:
>  # The space and time complexity of the WHERE optimizer should not be more than O(N2).
>  # The redesign should be provably correct. This requires constructing a mathematical system with well defined elements and operations. 
>  # The redesign should generate the same result for the expressions that are logically equivalent. 
>  # The redesign should lead to significantly simpler implementation. This can be achieved using well defined and clearly separated operations and concepts.
>  # The scope of the redesign will be limited to the WHERE optimizer and so the changes will mostly be limited to where optimizer and Expression classes. For example, this redesign does not attempt to change the skip scan filter design or the WHERE compiler. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)