You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/01/18 00:15:50 UTC

[GitHub] [couchdb] chewbranca edited a comment on issue #2456: FDB Builtin reduce

chewbranca edited a comment on issue #2456: FDB Builtin reduce
URL: https://github.com/apache/couchdb/pull/2456#issuecomment-575836265
 
 
   So I've been perusing the RFC, `couch_views_skiplist.erl` and the original whitepaper, and one major difference jumps out to me: the skip list implementation here is _not_ a linked list based approach, as per traditional skip lists, but rather embeds the levels into the keys themselves to allow for walking different levels of values. The other thing that jumps out to me is that the querying algorithm utilizes a bottom up approach starting at level zero rather than a top down approach starting at max level, which seems backwards and potentially less performant, especially since right now it seems to spend a lot of time fetching individual keys.
   
   I think storing the levels in the key affords us an opportunity to heavily lean on range queries, and potentially ignore direct key lookups entirely. Essentially I was thinking you could start at max level, take the baseline supplied startkey and endkey, and then do a range query on `{...,...,$MAX_LEVEL,$start_key}..{...,...,$MAX_LEVEL,$end_key}`. Then you take the results and divide and conquer the lower and upper key spaces with similar queries at one level lower, eg lower query of `{...,...,$MAX_LEVEL - 1,$start_key}..{...,...,$MAX_LEVEL - 1,$results[0]}` and upper query of `{...,...,$MAX_LEVEL - 1,$results[-1]}..{...,...,$MAX_LEVEL - 1,$end_key}`. And then you recurse through the lower levels until you've bottomed out or `$start_key == $results[0] and $end_key == $results[-1]`. In theory this algorithm could fulfill the requests without ever needing to do individual key lookups, while maximizing the range of results queried at the upper levels.
   
   Here's a proof of concept query algorithm:
   
   ```erlang
   query(View, StartKey, EndKey) ->
       query(View, ?MAX_LEVEL, StartKey, EndKey).
   
   
   query(_, ?MIN_LEVEL, _, _) ->
       [];
   query(View, Level, StartKey, EndKey) ->
       LevelDown = Level - 1,
       Start = make_key(View, Level, StartKey),
       End = make_key(View, Level, EndKey),
   
       [First | _] = Results = fdb_range_query(Start, End),
       Last = lists:last(Results),
   
       case {StartKey < First, Last < EndKey} of
           {true, true} ->
               query(View, LevelDown, StartKey, First) ++ Results ++ query(View, LevelDown, Last, EndKey);
           {true, false} when Last =:= EndKey ->
               query(View, LevelDown, StartKey, First) ++ Results;
           {false, true} when First =:= StartKey ->
               Results ++ query(View, LevelDown, Last, EndKey);
           {false, false} when First =:= StartKey andalso Last =:= EndKey ->
               Results;
           {_, _} ->
               throw({error, shouldnt_be_here})
       end.
   
   
   make_key(View, Level, Key) ->
       {View#view.db, ?DB_VIEWS, View#view.sig, ?VIEW_REDUCE_SK_RANGE, View#view.id, Level, Key}.
   ```
   
   It's quite possible this approach isn't practical for any number of reasons, but it jumped out to me as a potentially viable algorithm that is simple and utilizes the range query support provided by FoundationDB.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services