You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by anirudha <gi...@git.apache.org> on 2016/08/18 21:19:00 UTC

[GitHub] phoenix pull request #192: Phoenix Cursors implementation

GitHub user anirudha opened a pull request:

    https://github.com/apache/phoenix/pull/192

    Phoenix Cursors implementation

    //todo

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bloomberg/phoenix cursors_snapshot

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #192
    
----
commit 1c39e4b109cbddaf2c5d4d841adcc98f8d6ab401
Author: Gabriel Jimenez <gj...@bloomberg.com>
Date:   2016-08-16T17:59:52Z

    Parsing module for the Cursor feature.
    Also added a new test CursorParserTest

commit d6d4b180c6cc3dc111aad8b3f98d95f22d3b1ef8
Author: Gabriel Jimenez <gj...@bloomberg.com>
Date:   2016-08-16T19:20:13Z

    Compiler module for the Cursor Feature.
    
    Main changes were to PhoenixStatement and compilers, but there needed to be changes to the subclasses of QueryPlan.

commit d691cf2e750c7bbbfaad758dfa9df816eb8f3bb0
Author: Gabriel Jimenez <gj...@bloomberg.com>
Date:   2016-08-16T19:44:07Z

    Complete functionality of the Cursor feature.
    
    New end to end test cases added for Cursors in 'CursorWithRowValueConstructorIT.java'.

commit 445c029f8c97616c2b39001ce1fee17a73a54016
Author: Gabriel Jimenez <gj...@bloomberg.com>
Date:   2016-08-17T21:33:13Z

    New STATIC option for the cursor feature, 'STATIC' keyword. Mirrors standard SQL definition of cursor behavior for STATIC option. This means results fetched by the cursor will not reflect changes made to the data after opening the cursor. Current implementation does not make a copy of the data, but rather uses the ROW_TIMESTAMP pseudo column. As a result, to use the STATIC option the target table must have designated a column as ROW_TIMESTAMP.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix issue #192: Phoenix Cursors implementation

Posted by JamesRTaylor <gi...@git.apache.org>.

Github user JamesRTaylor commented on the issue:

    https://github.com/apache/phoenix/pull/192
  
    Thanks for the patch, @anirudha. I like your suggestions, @ankitsinghal. The RVC approach will be limited (as Ankit mentioned it will not work in all the cases) and it will be much harder to implement than the approach that Ankit suggested (i.e. just using a ResultSet).
    
    To simplify things, we can initially not support the previous() operation (I think it's optional anyway). Also, cursors will not see newly inserted inserted data, but I think that's ok in an initial implementation too as you can specify in the metadata that it functions this way. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request #192: Phoenix Cursors implementation

Posted by anirudha <gi...@git.apache.org>.

Github user anirudha closed the pull request at:

    https://github.com/apache/phoenix/pull/192


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request #192: Phoenix Cursors implementation

Posted by ankitsinghal <gi...@git.apache.org>.

Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/192#discussion_r77612036
  
    --- Diff: phoenix-core/src/main/antlr3/PhoenixSQL.g ---
    @@ -726,8 +727,11 @@ upsert_column_refs returns [Pair<List<ColumnDef>,List<ColumnName>> ret]
     
     // Parse a full declare cursor expression structure.
     declare_cursor_node returns [DeclareCursorStatement ret]
    -    :    DECLARE c=cursor_name CURSOR FOR s=select_node
    -        {ret = factory.declareCursor(c, s); }
    +@init{boolean isStatic = false;}
    +    :   DECLARE c=cursor_name CURSOR
    +        (STATIC {isStatic = true;})?
    --- End diff --
    
    here it can be replaced with, (static=STATIC)? and use factory.declareCursor(c,s static!=null)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix issue #192: Phoenix Cursors implementation

Posted by ankitsinghal <gi...@git.apache.org>.

Github user ankitsinghal commented on the issue:

https://github.com/apache/phoenix/pull/192

@anirudha , thanks for the pull request. Is it possible for you to change the JIRA subject to include PHOENIX-2606 or link somehow, so that we can see the comment logged on JIRA as well.

With current implementation, every query is modified to include primary key columns and tries to use RVC, which may produce improper results in following cases and also sometimes there is no advantage in using it as they don\u2019t limit scan.

- duplicate values for a column
- order by on non-primary key axis.
- Primary key columns having null value.
- Aggregate queries

@samarthjain can you please confirm as per your observation for what all queries RVC cannot be used?

@anirudha
To abstract the query complexities and for initial support of Cursors, we should not modify any query but instead we can keep the ResultSet object open for corresponding cursor(with timeout) and start caching rows as we proceed further with next() calls(FETCH NEXT FROM cursor) , the cache will be used for previous() calls(FETCH PRIOR FROM cusror) on resultSet. (cache will also be used for next() calls if we go previous() in the cache).\u200b

Pros/Cons of above approach:-

Pros:-

- Highly abstracted, We don\u2019t need to understand each and every query and develop logic separately for them.
- As per the current implementation, We don\u2019t need to expend ORDER BY(on non-primary key axis) all the time to include primary key column for uniqueness. As this will cause problem at the server because we will have more keys(almost all keys) to sort every time. (RVC will not restrict in this case).
- Cache size can also be limited by the user and if we exhaust the cache , cache can be updated by using re-runing the query with LIMIT+OFFSET only
- An optimization can be done for flat queries(including INDEX queries) using last and peeked SCAN keys(instead of RVC to handle null and duplicate properly) for updating the cache instead of LIMIT+OFFSET.
- Snapshot/Static queries can be provided by storing the compile time of OPEN CURSOR (and can be used to limit the scan upper bound for timestamp with it).

Cons:-

- As previous() is not supported on the server(Hbase), so cache overhead is there to maintain the results to support previous() at client.
- Re-calculation of results after we reach cache limit or scanner timeout.

@JamesRTaylor , WDYT? any suggestions

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request #192: Phoenix Cursors implementation

Posted by ankitsinghal <gi...@git.apache.org>.

Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/192#discussion_r77612514
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/CursorUtil.java ---
    @@ -62,24 +64,44 @@
             private boolean selectHasPKCol = false;
     
             private CursorWrapper(String cursorName, String selectSQL, List<OrderByNode> orderBy){
    +            this(cursorName, selectSQL, orderBy, false);
    +        }
    +
    +        private CursorWrapper(String cursorName, String selectSQL, List<OrderByNode> orderBy, boolean isCursorStatic){
                 this.cursorName = cursorName;
                 this.orderBy = orderBy;
                 this.selectSQL = selectSQL;
                 this.listSelectColNames = new ArrayList<String>(java.util.Arrays.asList(selectSQL
                         .substring(selectSQL.indexOf("SELECT") + 7, selectSQL.indexOf("FROM")).trim()
                         .replaceAll("[()]", "").toUpperCase().split(",")));
    +            this.isCursorStatic = isCursorStatic;
             }
     
             private synchronized void openCursor(Connection conn) throws SQLException {
                 if(isOpen){
                     return;
                 }
    -            QueryPlan plan = conn.prepareStatement(selectSQL).unwrap(PhoenixPreparedStatement.class).compileQuery();
    -            List<String> listPKColNames = new ArrayList<String>(Arrays.asList(plan.getTableRef().getTable()
    -                    .getPKColumns().toString().replaceAll("[\\[ \\]]", "").toUpperCase().split(",")));
                 StringBuilder whereBuilder = new StringBuilder(" WHERE (");
                 StringBuilder orderByBuilder = new StringBuilder(" ORDER BY ");
                 StringBuilder selectBuilder = new StringBuilder(listSelectColNames.toString().replaceAll("[\\[ \\]]", ""));
    +
    +            QueryPlan plan = conn.prepareStatement(selectSQL).unwrap(PhoenixPreparedStatement.class).compileQuery();
    +            PTable table = plan.getTableRef().getTable();
    +            //Process static cursor option
    +            if(isCursorStatic){
    +                int columnIndex = table.getRowTimestampColPos();
    +                if(columnIndex == -1) throw new SQLException("Cursor " + cursorName + " declared as STATIC, " +
    +                        "but target table does not contain a ROW_TIMESTAMP column!");
    +
    +                String columnName = table.getColumns().get(columnIndex).getName().getString();
    +                whereBuilder.append(columnName).append(") <= (");
    +                String timestampFilter = "TO_DATE('"+DateUtil.getDateFormatter(DateUtil.DEFAULT_DATE_FORMAT).format(new java.sql.Date(System.currentTimeMillis()))+"')";
    +                whereBuilder.append(timestampFilter).append(") AND (");
    +            }
    +
    +            List<String> listPKColNames = new ArrayList<String>(Arrays.asList(table.getPKColumns().toString().replaceAll("[\\[ \\]]", "").toUpperCase().split(",")));
    +
    +            //Process ORDER BY expressions in the internal select statement
    --- End diff --
    
    By static cursor you mean that subsequent fetches on the cursor should see the new data upserted after the cursor is open? 
    If that so then we don't need to have timestamp filter for it, we can use scan.setTimeRange(0, CURRENT_SCN!=null ?  CURRENT_SCN, compileTimeORCursorOpenTime));


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request #192: Phoenix Cursors implementation

Posted by ankitsinghal <gi...@git.apache.org>.

Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/192#discussion_r77613518
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/CursorUtil.java ---
    @@ -247,11 +269,11 @@ private void updateSelectValues(Tuple currentRow) throws SQLException {
                         PDataType type = projector.getExpression().getDataType();
                         Object value = projector.getValue(currentRow, type, new ImmutableBytesPtr());
                         if(value == null){
    -                        whereExpressionNext = whereExpressionNext.replaceFirst(":"+Integer.toString(colBinding),"NULL");
    -                        whereExpressionPrior = whereExpressionPrior.replaceFirst(":"+Integer.toString(colBinding),"NULL");
    +                        whereExpressionNext = whereExpressionNext.replaceFirst("::"+Integer.toString(colBinding),"NULL");
    +                        whereExpressionPrior = whereExpressionPrior.replaceFirst("::"+Integer.toString(colBinding),"NULL");
                         } else{
    -                        whereExpressionNext = whereExpressionNext.replaceFirst(":"+Integer.toString(colBinding),formatString(value, type));
    -                        whereExpressionPrior = whereExpressionPrior.replaceFirst(":"+Integer.toString(colBinding),formatString(value, type));
    +                        whereExpressionNext = whereExpressionNext.replaceFirst("::"+Integer.toString(colBinding),formatString(value, type));
    +                        whereExpressionPrior = whereExpressionPrior.replaceFirst("::"+Integer.toString(colBinding),formatString(value, type));
    --- End diff --
    
    I don't think , we need to modify queries for the this support.   You may need to remove this code.(And instead of replacing why we can't use prepared statements directly).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---