You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2014/05/15 06:34:24 UTC
[jira] [Updated] (PHOENIX-654) Minimize projection into scan for
VIEW
[ https://issues.apache.org/jira/browse/PHOENIX-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Taylor updated PHOENIX-654:
---------------------------------
Description:
When you create a TABLE, we insert an empty key value into the first column family that we can count on being there for every row. For a VIEW, we don't do that, so we just fall back on projecting everything into a scan. If there are lots of columns (for example, 60,000 in [this](https://groups.google.com/forum/_!topic/phoenix-hbase-user/JgQjlqC4-uw) case), the scan is very slow.
Instead, we should only project everything when absolutely necessary, in these cases:
* IS NULL expression
* CASE WHEN with an ELSE expression
* Usages of row value constructor
* When a column in the primary key is used
* When there is no where clause
* When there is a group by of a nullable expression
We could potentially do the same for a TABLE, but the empty key value seems like a better trade off as far as performance goes. In addition, we need the empty key value as a row cannot exist without at least one key value, making it impossible to support use cases that only define a primary key.
was:
When you create a TABLE, we insert an empty key value into the first column family that we can count on being there for every row. For a VIEW, we don't do that, so we just fall back on projecting everything into a scan. If there are lots of columns (for example, 60,000 in [this](https://groups.google.com/forum/_!topic/phoenix-hbase-user/JgQjlqC4-uw) case), the scan is very slow.
Instead, we should only project everything when absolutely necessary, in these cases:
* When the EvaluateOnCompletionVisitor is run over the where clause expression returns true for visitor.evaluateOnCompletion(). This captures cases such as:
* IS NULL check
* CASE WHEN ELSE
* Usages of row value constructor
* When there is no where clause
* When there is a group by of a nullable expression
We could potentially do the same for a TABLE, but the empty key value seems like a better trade off as far as performance goes. In addition, we need the empty key value as a row cannot exist without at least one key value, making it impossible to support use cases that only define a primary key.
> Minimize projection into scan for VIEW
> --------------------------------------
>
> Key: PHOENIX-654
> URL: https://issues.apache.org/jira/browse/PHOENIX-654
> Project: Phoenix
> Issue Type: Task
> Reporter: James Taylor
>
> When you create a TABLE, we insert an empty key value into the first column family that we can count on being there for every row. For a VIEW, we don't do that, so we just fall back on projecting everything into a scan. If there are lots of columns (for example, 60,000 in [this](https://groups.google.com/forum/_!topic/phoenix-hbase-user/JgQjlqC4-uw) case), the scan is very slow.
> Instead, we should only project everything when absolutely necessary, in these cases:
> * IS NULL expression
> * CASE WHEN with an ELSE expression
> * Usages of row value constructor
> * When a column in the primary key is used
> * When there is no where clause
> * When there is a group by of a nullable expression
> We could potentially do the same for a TABLE, but the empty key value seems like a better trade off as far as performance goes. In addition, we need the empty key value as a row cannot exist without at least one key value, making it impossible to support use cases that only define a primary key.
--
This message was sent by Atlassian JIRA
(v6.2#6252)