You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2015/07/01 01:12:04 UTC

[jira] [Comment Edited] (PHOENIX-1954) Reserve chunks of numbers for a sequence

    [ https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609077#comment-14609077 ] 

James Taylor edited comment on PHOENIX-1954 at 6/30/15 11:11 PM:
-----------------------------------------------------------------

[~jfernando_sfdc] - here's an patch that compiles. I think it mostly threads everything in where needed.

For the case you brought up about referencing the same sequence with different allocations, I think it's going to be best to treat these as the same, but keep the max allocation we see (this is what this patch does). Otherwise, putting it into the SequenceKey will be problematic as we don't have this information when the sequence is created and dropped (which is one way that clears out our client side cache).

For example, the following query would return the same value for the NEXT VALUE expressions (the next value for the sequence with a batch of 1000 consecutive sequences allocated):
{code}
SELECT NEXT VALUE FOR seq, NEXT 1000 VALUES FOR seq FROM T LIMIT 1;
{code}
as does this query today if both were NEXT VALUE FOR calls.

Also, we should allocate numToAllocate * incrementByAmount on the server side. Your sequences will likely be incrementing by 1, but it'd be allowed for it to be more than 1 too.


was (Author: jamestaylor):
[~jfernando_sfdc] - here's an patch that compiles. I think it mostly threads everything in where needed.

For the case you brought up about referencing the same sequence with different allocations, I think it's going to be best to treat these as the same, but keep the max allocation we see (this is what this patch does). Otherwise, putting it into the SequenceKey will be problematic as we don't have this information when the sequence is created and dropped (which is one way that clears out our client side cache).

Also, we should allocate numToAllocate * incrementByAmount on the server side. Your sequences will likely be incrementing by 1, but it'd be allowed for it to be more than 1 too.

> Reserve chunks of numbers for a sequence
> ----------------------------------------
>
>                 Key: PHOENIX-1954
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>            Assignee: Jan Fernando
>         Attachments: PHOENIX-1954-wip.patch
>
>
> In order to be able to generate many ids in bulk (for example in map reduce jobs) we need a way to generate or reserve large sets of ids. We also need to mix ids reserved with incrementally generated ids from other clients. 
> For this we need to atomically increment the sequence and return the value it had when the increment happened.
> If we're OK to throw the current cached set of values away we can do
> {{NEXT VALUE FOR <seq>(,<N>)}}, that needs to increment value and return the value it incremented from (i.e. it has to throw the current cache away, and return the next value it found at the server).
> Or we can invent a new syntax {{RESERVE VALUES FOR <seq>, <N>}} that does the same, but does not invalidate the cache.
> Note that in either case we won't retrieve the reserved set of values via {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we need to guarantee is that after a call to {{RESERVE VALUES FOR <seq>, <N>}}, which returns a value <M> is that the range [M, M+N) won't be used by any other user of the sequence. My might need reserve 1bn ids this way ahead of a map reduce run.
> Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)