You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by abdullah alamoudi <ba...@gmail.com> on 2017/11/30 23:42:47 UTC

The IIndexCursor interface

Dear devs,
The IIndexCursor interface is one of the critical interfaces inside asteridxb. It is used to access tuples inside indexes, we have many implementations for it and it is used differently in a different places. We are trying to specify a contract for the interface that all implementors/users of the a cursor have to follow to ensure consistent state and no leaked resources under any circumstances. The scope of this email focuses on the lifecycle of cursors and on the following existing methods:

-- void open(ICursorInitialState initialState, ISearchPredicate searchPred) throws HyracksDataException;
-- boolean hasNext() throws HyracksDataException;
-- void next() throws HyracksDataException;
-- void close() throws HyracksDataException;
-- void reset() throws HyracksDataException;

Currently, these calls are "mostly" used as follows in our code:

- If there are multiple search predicates:
cursor = new cursor();
while (more predicates){
  cursor.reset()
  cursor.open(predicate);
  while (cursor.hasNext()){
    cursor.next()
  }
}
cursor.close();

- If there is a single search predicate:
cursor = new cursor();
cursor.open(predicate);
while (cursor.hasNext()){
  cursor.next()
}
cursor.close();

There are two problems with this: 

1. There is no enforcement of any type of contract. For example, one can open a cursor and reset it and then continue to read tuples from the cursor as follows:

cursor.open(predicate);
cursor.hasNext()
cursor.next()
cursor.reset()
cursor.hasNext()
cursor.next()

and continue to read tuples. This is bug prone and can cause hidden bugs to linger for a long time.

2. Naming and symmetry: open calls don't have corresponding close calls "unless we know the cursor will be used with exactly one search predicate"
With this, the implementation of the cursor lead to either duplicate code or having close() call reset() or the other way around and handling of special cases.
Moreover, when there are slight differences, often it is easy to make a change in one and forget about the other.

==========================================
To deal with these issues, we are proposing the following:

1. change the methods to:

-- void open(ICursorInitialState initialState, ISearchPredicate searchPred) throws HyracksDataException;
-- boolean hasNext() throws HyracksDataException;
-- void next() throws HyracksDataException;
-- void close(); // used to be reset()
-- void destroy(); // used to be close()


The call cycle becomes:
- If there are multiple search predicates:
cursor = new cursor();
while (more predicates){
  cursor.open(predicate);
  while (cursor.hasNext()){
    cursor.next()
  }
  cursor.close(); // used to be reset()
}
cursor.destroy(); // used to be close()

- If there is a single search predicate:
cursor = new cursor();
cursor.open(predicate);
while (cursor.hasNext()){
  cursor.next()
}
cursor.close(); // used to be reset()
cursor.destroy(); // used to be close()

This way, we have a symmetry and we know that:
-- A created cursor will always have cursor.destroy() called.
-- An open cursor will always have cursor.close() called.


2. Enforce the cursor state machine as follows:
The states are:
CLOSED
OPEN
DESTROYED
When a cursor object is created, it is in the CLOSED state.

- CLOSED: The only legal calls are open() --> OPEN, or destroy() --> DESTROYED
- OPEN: The only legal calls are hasNext(), next(), or close() --> CLOSED.
- DESTROYED: All calls are illegal.

We can then add tests to ensure that each of the cursors is enforcing the contract.

Thoughts?

Re: The IIndexCursor interface

Posted by Chen Luo <cl...@uci.edu>.
A typo in previous email "internal objects should only be created during
the call of open" -> "internal objects should only be created during the
first call of open"

On Thu, Nov 30, 2017 at 4:37 PM, Chen Luo <cl...@uci.edu> wrote:

> +1.
>
> It seems to me the main issue for cursor is that a cursor sometime needs
> to be re-used for performance reason (e.g., during primary key lookups
> after secondary index search). One thing to note is that when make these
> changes, or implement new cursors, one has to be very careful that a cursor
> might be reused. As a requirement, internal objects should only be created
> during the call of open, and close method must not clean up these objects
> (unfortunately, the previous implementation of LSMBTreePointSearchCursor[1]
> mistakenly clears its internal objects during reset, which results in tons
> of objects are created during primary key lookups.)
>
> [1] https://github.com/apache/asterixdb/blob/
> 89e6a93277205a9dbc76c18e249919a745d224d2/hyracks-fullstack/
> hyracks/hyracks-storage-am-lsm-btree/src/main/java/org/
> apache/hyracks/storage/am/lsm/btree/impls/LSMBTreePointSearchCursor.
> java#L142
>
> On Thu, Nov 30, 2017 at 3:42 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
>> Dear devs,
>> The IIndexCursor interface is one of the critical interfaces inside
>> asteridxb. It is used to access tuples inside indexes, we have many
>> implementations for it and it is used differently in a different places. We
>> are trying to specify a contract for the interface that all
>> implementors/users of the a cursor have to follow to ensure consistent
>> state and no leaked resources under any circumstances. The scope of this
>> email focuses on the lifecycle of cursors and on the following existing
>> methods:
>>
>> -- void open(ICursorInitialState initialState, ISearchPredicate
>> searchPred) throws HyracksDataException;
>> -- boolean hasNext() throws HyracksDataException;
>> -- void next() throws HyracksDataException;
>> -- void close() throws HyracksDataException;
>> -- void reset() throws HyracksDataException;
>>
>> Currently, these calls are "mostly" used as follows in our code:
>>
>> - If there are multiple search predicates:
>> cursor = new cursor();
>> while (more predicates){
>>   cursor.reset()
>>   cursor.open(predicate);
>>   while (cursor.hasNext()){
>>     cursor.next()
>>   }
>> }
>> cursor.close();
>>
>> - If there is a single search predicate:
>> cursor = new cursor();
>> cursor.open(predicate);
>> while (cursor.hasNext()){
>>   cursor.next()
>> }
>> cursor.close();
>>
>> There are two problems with this:
>>
>> 1. There is no enforcement of any type of contract. For example, one can
>> open a cursor and reset it and then continue to read tuples from the cursor
>> as follows:
>>
>> cursor.open(predicate);
>> cursor.hasNext()
>> cursor.next()
>> cursor.reset()
>> cursor.hasNext()
>> cursor.next()
>>
>> and continue to read tuples. This is bug prone and can cause hidden bugs
>> to linger for a long time.
>>
>> 2. Naming and symmetry: open calls don't have corresponding close calls
>> "unless we know the cursor will be used with exactly one search predicate"
>> With this, the implementation of the cursor lead to either duplicate code
>> or having close() call reset() or the other way around and handling of
>> special cases.
>> Moreover, when there are slight differences, often it is easy to make a
>> change in one and forget about the other.
>>
>> ==========================================
>> To deal with these issues, we are proposing the following:
>>
>> 1. change the methods to:
>>
>> -- void open(ICursorInitialState initialState, ISearchPredicate
>> searchPred) throws HyracksDataException;
>> -- boolean hasNext() throws HyracksDataException;
>> -- void next() throws HyracksDataException;
>> -- void close(); // used to be reset()
>> -- void destroy(); // used to be close()
>>
>>
>> The call cycle becomes:
>> - If there are multiple search predicates:
>> cursor = new cursor();
>> while (more predicates){
>>   cursor.open(predicate);
>>   while (cursor.hasNext()){
>>     cursor.next()
>>   }
>>   cursor.close(); // used to be reset()
>> }
>> cursor.destroy(); // used to be close()
>>
>> - If there is a single search predicate:
>> cursor = new cursor();
>> cursor.open(predicate);
>> while (cursor.hasNext()){
>>   cursor.next()
>> }
>> cursor.close(); // used to be reset()
>> cursor.destroy(); // used to be close()
>>
>> This way, we have a symmetry and we know that:
>> -- A created cursor will always have cursor.destroy() called.
>> -- An open cursor will always have cursor.close() called.
>>
>>
>> 2. Enforce the cursor state machine as follows:
>> The states are:
>> CLOSED
>> OPEN
>> DESTROYED
>> When a cursor object is created, it is in the CLOSED state.
>>
>> - CLOSED: The only legal calls are open() --> OPEN, or destroy() -->
>> DESTROYED
>> - OPEN: The only legal calls are hasNext(), next(), or close() --> CLOSED.
>> - DESTROYED: All calls are illegal.
>>
>> We can then add tests to ensure that each of the cursors is enforcing the
>> contract.
>>
>> Thoughts?
>
>
>

Re: The IIndexCursor interface

Posted by Chen Luo <cl...@uci.edu>.
+1.

It seems to me the main issue for cursor is that a cursor sometime needs to
be re-used for performance reason (e.g., during primary key lookups after
secondary index search). One thing to note is that when make these changes,
or implement new cursors, one has to be very careful that a cursor might be
reused. As a requirement, internal objects should only be created during
the call of open, and close method must not clean up these objects
(unfortunately, the previous implementation of LSMBTreePointSearchCursor[1]
mistakenly clears its internal objects during reset, which results in tons
of objects are created during primary key lookups.)

[1]
https://github.com/apache/asterixdb/blob/89e6a93277205a9dbc76c18e249919a745d224d2/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-btree/src/main/java/org/apache/hyracks/storage/am/lsm/btree/impls/LSMBTreePointSearchCursor.java#L142

On Thu, Nov 30, 2017 at 3:42 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Dear devs,
> The IIndexCursor interface is one of the critical interfaces inside
> asteridxb. It is used to access tuples inside indexes, we have many
> implementations for it and it is used differently in a different places. We
> are trying to specify a contract for the interface that all
> implementors/users of the a cursor have to follow to ensure consistent
> state and no leaked resources under any circumstances. The scope of this
> email focuses on the lifecycle of cursors and on the following existing
> methods:
>
> -- void open(ICursorInitialState initialState, ISearchPredicate
> searchPred) throws HyracksDataException;
> -- boolean hasNext() throws HyracksDataException;
> -- void next() throws HyracksDataException;
> -- void close() throws HyracksDataException;
> -- void reset() throws HyracksDataException;
>
> Currently, these calls are "mostly" used as follows in our code:
>
> - If there are multiple search predicates:
> cursor = new cursor();
> while (more predicates){
>   cursor.reset()
>   cursor.open(predicate);
>   while (cursor.hasNext()){
>     cursor.next()
>   }
> }
> cursor.close();
>
> - If there is a single search predicate:
> cursor = new cursor();
> cursor.open(predicate);
> while (cursor.hasNext()){
>   cursor.next()
> }
> cursor.close();
>
> There are two problems with this:
>
> 1. There is no enforcement of any type of contract. For example, one can
> open a cursor and reset it and then continue to read tuples from the cursor
> as follows:
>
> cursor.open(predicate);
> cursor.hasNext()
> cursor.next()
> cursor.reset()
> cursor.hasNext()
> cursor.next()
>
> and continue to read tuples. This is bug prone and can cause hidden bugs
> to linger for a long time.
>
> 2. Naming and symmetry: open calls don't have corresponding close calls
> "unless we know the cursor will be used with exactly one search predicate"
> With this, the implementation of the cursor lead to either duplicate code
> or having close() call reset() or the other way around and handling of
> special cases.
> Moreover, when there are slight differences, often it is easy to make a
> change in one and forget about the other.
>
> ==========================================
> To deal with these issues, we are proposing the following:
>
> 1. change the methods to:
>
> -- void open(ICursorInitialState initialState, ISearchPredicate
> searchPred) throws HyracksDataException;
> -- boolean hasNext() throws HyracksDataException;
> -- void next() throws HyracksDataException;
> -- void close(); // used to be reset()
> -- void destroy(); // used to be close()
>
>
> The call cycle becomes:
> - If there are multiple search predicates:
> cursor = new cursor();
> while (more predicates){
>   cursor.open(predicate);
>   while (cursor.hasNext()){
>     cursor.next()
>   }
>   cursor.close(); // used to be reset()
> }
> cursor.destroy(); // used to be close()
>
> - If there is a single search predicate:
> cursor = new cursor();
> cursor.open(predicate);
> while (cursor.hasNext()){
>   cursor.next()
> }
> cursor.close(); // used to be reset()
> cursor.destroy(); // used to be close()
>
> This way, we have a symmetry and we know that:
> -- A created cursor will always have cursor.destroy() called.
> -- An open cursor will always have cursor.close() called.
>
>
> 2. Enforce the cursor state machine as follows:
> The states are:
> CLOSED
> OPEN
> DESTROYED
> When a cursor object is created, it is in the CLOSED state.
>
> - CLOSED: The only legal calls are open() --> OPEN, or destroy() -->
> DESTROYED
> - OPEN: The only legal calls are hasNext(), next(), or close() --> CLOSED.
> - DESTROYED: All calls are illegal.
>
> We can then add tests to ensure that each of the cursors is enforcing the
> contract.
>
> Thoughts?

Re: The IIndexCursor interface

Posted by Mike Carey <dt...@gmail.com>.
+1 to the proposed changes...!  (Seems clean/understandable...)


On 11/30/17 3:42 PM, abdullah alamoudi wrote:
> Dear devs,
> The IIndexCursor interface is one of the critical interfaces inside asteridxb. It is used to access tuples inside indexes, we have many implementations for it and it is used differently in a different places. We are trying to specify a contract for the interface that all implementors/users of the a cursor have to follow to ensure consistent state and no leaked resources under any circumstances. The scope of this email focuses on the lifecycle of cursors and on the following existing methods:
>
> -- void open(ICursorInitialState initialState, ISearchPredicate searchPred) throws HyracksDataException;
> -- boolean hasNext() throws HyracksDataException;
> -- void next() throws HyracksDataException;
> -- void close() throws HyracksDataException;
> -- void reset() throws HyracksDataException;
>
> Currently, these calls are "mostly" used as follows in our code:
>
> - If there are multiple search predicates:
> cursor = new cursor();
> while (more predicates){
>    cursor.reset()
>    cursor.open(predicate);
>    while (cursor.hasNext()){
>      cursor.next()
>    }
> }
> cursor.close();
>
> - If there is a single search predicate:
> cursor = new cursor();
> cursor.open(predicate);
> while (cursor.hasNext()){
>    cursor.next()
> }
> cursor.close();
>
> There are two problems with this:
>
> 1. There is no enforcement of any type of contract. For example, one can open a cursor and reset it and then continue to read tuples from the cursor as follows:
>
> cursor.open(predicate);
> cursor.hasNext()
> cursor.next()
> cursor.reset()
> cursor.hasNext()
> cursor.next()
>
> and continue to read tuples. This is bug prone and can cause hidden bugs to linger for a long time.
>
> 2. Naming and symmetry: open calls don't have corresponding close calls "unless we know the cursor will be used with exactly one search predicate"
> With this, the implementation of the cursor lead to either duplicate code or having close() call reset() or the other way around and handling of special cases.
> Moreover, when there are slight differences, often it is easy to make a change in one and forget about the other.
>
> ==========================================
> To deal with these issues, we are proposing the following:
>
> 1. change the methods to:
>
> -- void open(ICursorInitialState initialState, ISearchPredicate searchPred) throws HyracksDataException;
> -- boolean hasNext() throws HyracksDataException;
> -- void next() throws HyracksDataException;
> -- void close(); // used to be reset()
> -- void destroy(); // used to be close()
>
>
> The call cycle becomes:
> - If there are multiple search predicates:
> cursor = new cursor();
> while (more predicates){
>    cursor.open(predicate);
>    while (cursor.hasNext()){
>      cursor.next()
>    }
>    cursor.close(); // used to be reset()
> }
> cursor.destroy(); // used to be close()
>
> - If there is a single search predicate:
> cursor = new cursor();
> cursor.open(predicate);
> while (cursor.hasNext()){
>    cursor.next()
> }
> cursor.close(); // used to be reset()
> cursor.destroy(); // used to be close()
>
> This way, we have a symmetry and we know that:
> -- A created cursor will always have cursor.destroy() called.
> -- An open cursor will always have cursor.close() called.
>
>
> 2. Enforce the cursor state machine as follows:
> The states are:
> CLOSED
> OPEN
> DESTROYED
> When a cursor object is created, it is in the CLOSED state.
>
> - CLOSED: The only legal calls are open() --> OPEN, or destroy() --> DESTROYED
> - OPEN: The only legal calls are hasNext(), next(), or close() --> CLOSED.
> - DESTROYED: All calls are illegal.
>
> We can then add tests to ensure that each of the cursors is enforcing the contract.
>
> Thoughts?


Re: The IIndexCursor interface

Posted by Murtadha Hubail <hu...@gmail.com>.
+1 to the proposed changes.

Cheers,
Murtadha

On 12/01/2017, 2:42 AM, "abdullah alamoudi" <ba...@gmail.com> wrote:

    Dear devs,
    The IIndexCursor interface is one of the critical interfaces inside asteridxb. It is used to access tuples inside indexes, we have many implementations for it and it is used differently in a different places. We are trying to specify a contract for the interface that all implementors/users of the a cursor have to follow to ensure consistent state and no leaked resources under any circumstances. The scope of this email focuses on the lifecycle of cursors and on the following existing methods:
    
    -- void open(ICursorInitialState initialState, ISearchPredicate searchPred) throws HyracksDataException;
    -- boolean hasNext() throws HyracksDataException;
    -- void next() throws HyracksDataException;
    -- void close() throws HyracksDataException;
    -- void reset() throws HyracksDataException;
    
    Currently, these calls are "mostly" used as follows in our code:
    
    - If there are multiple search predicates:
    cursor = new cursor();
    while (more predicates){
      cursor.reset()
      cursor.open(predicate);
      while (cursor.hasNext()){
        cursor.next()
      }
    }
    cursor.close();
    
    - If there is a single search predicate:
    cursor = new cursor();
    cursor.open(predicate);
    while (cursor.hasNext()){
      cursor.next()
    }
    cursor.close();
    
    There are two problems with this: 
    
    1. There is no enforcement of any type of contract. For example, one can open a cursor and reset it and then continue to read tuples from the cursor as follows:
    
    cursor.open(predicate);
    cursor.hasNext()
    cursor.next()
    cursor.reset()
    cursor.hasNext()
    cursor.next()
    
    and continue to read tuples. This is bug prone and can cause hidden bugs to linger for a long time.
    
    2. Naming and symmetry: open calls don't have corresponding close calls "unless we know the cursor will be used with exactly one search predicate"
    With this, the implementation of the cursor lead to either duplicate code or having close() call reset() or the other way around and handling of special cases.
    Moreover, when there are slight differences, often it is easy to make a change in one and forget about the other.
    
    ==========================================
    To deal with these issues, we are proposing the following:
    
    1. change the methods to:
    
    -- void open(ICursorInitialState initialState, ISearchPredicate searchPred) throws HyracksDataException;
    -- boolean hasNext() throws HyracksDataException;
    -- void next() throws HyracksDataException;
    -- void close(); // used to be reset()
    -- void destroy(); // used to be close()
    
    
    The call cycle becomes:
    - If there are multiple search predicates:
    cursor = new cursor();
    while (more predicates){
      cursor.open(predicate);
      while (cursor.hasNext()){
        cursor.next()
      }
      cursor.close(); // used to be reset()
    }
    cursor.destroy(); // used to be close()
    
    - If there is a single search predicate:
    cursor = new cursor();
    cursor.open(predicate);
    while (cursor.hasNext()){
      cursor.next()
    }
    cursor.close(); // used to be reset()
    cursor.destroy(); // used to be close()
    
    This way, we have a symmetry and we know that:
    -- A created cursor will always have cursor.destroy() called.
    -- An open cursor will always have cursor.close() called.
    
    
    2. Enforce the cursor state machine as follows:
    The states are:
    CLOSED
    OPEN
    DESTROYED
    When a cursor object is created, it is in the CLOSED state.
    
    - CLOSED: The only legal calls are open() --> OPEN, or destroy() --> DESTROYED
    - OPEN: The only legal calls are hasNext(), next(), or close() --> CLOSED.
    - DESTROYED: All calls are illegal.
    
    We can then add tests to ensure that each of the cursors is enforcing the contract.
    
    Thoughts?