You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2016/08/06 00:32:20 UTC

[jira] [Commented] (PHOENIX-3156) Bug in DistinctPrefixFilter

    [ https://issues.apache.org/jira/browse/PHOENIX-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410328#comment-15410328 ] 

Lars Hofhansl commented on PHOENIX-3156:
----------------------------------------

Actually [~samarthjain] saw a test failing when trying his column encoding changes. Looking at that in detail I discovered this problem.

> Bug in DistinctPrefixFilter
> ---------------------------
>
>                 Key: PHOENIX-3156
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3156
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Blocker
>             Fix For: 4.8.0
>
>
> There's a corner case I found where a DISTINCT and GROUP BY query along a prefix of a compound row key might return incorrect results.
> The filter relies on seeing the _0 column absolutely last, and not seeing all Cells that should be filtered. That break in two scenarios:
> # we have a table with key (key1, key2, key3) and columns (c1 and c2). Now construct a WHERE <a clause that always matches c1>, <a clause that filters by c2) GROUP BY key1, key2. Now the filter would mis-skip when it sees the Cell for c1.
> # we force lower key column names. In that case those would sort after the _0 column. The DistinctPrefixFilter would see the _0 column first and skip.
> I can fix #1 (by ignoring all Cells other than then _0 one). I do not know how to fix case #2.
> I think this is a blocker and we may have to undo the entire DISTINCT and GROUP BY prefix optimization.
> [~ankit@apache.org], [~giacomotaylor], [~samarthjain].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)