You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Tom van Neerijnen <to...@tomvn.com> on 2013/10/11 17:09:58 UTC

Sorting a view by a reduce result (or a better document structure or M/R)

Hi all

I'm developing a ranking service I'd like to throw scores for multiple
users at and later be able to retrieve a list of users ranked/ordered by
their scores. Problem is if I make it time series data and accumulate the
total score in a reduce I can't put the score in the key to sort by.

Does anyone have suggestions as to how I could re-jig my documents and/or
M/R to accommodate this kind of output or should I simply consider storing
a total document along side all my time series documents? My concern with
storing a total document is I can never be sure the total in the document
exactly matches my M/R over my time series data because I'd have to have an
app query the view and, using the result of the view query, update the
total document, so that's a last resort.

I considered sorting in a list function but we're potentially looking at
millions of users so that's not really an option.

I'm imagining time series documents like so:
"abcdefghi" : {
  "userId": "jim",
  "scoreChange": 10,
  "country": "GB",
  "game": "solitaire",
  "date": "2013-10-08T13:50:19"
}

"jklmnopqr" : {
  "userId": "jim",
  "scoreChange": -5,
  "country": "GB",
  "game": "solitaire",
  "date": "2013-10-08T13:51:19"
}

"jklmnopqr" : {
  "userId": "james",
  "scoreChange": 4,
  "country": "GB",
  "game": "freecell",
  "date": "2013-10-08T13:52:19"
}

I could then have this M/R for, as an example, ranks by game then user:
"by_game": {
    "map": "function(doc) { emit([doc.game, doc.userId], doc.scoreChange)}",
    "reduce": "function(keys, values, rereduce) { return sum(values); }"
}

and get this list from it:
{"key":["freecell","james"],"value":4}
{"key":["solitaire","jim"],"value":5},

Thanks

Re: Sorting a view by a reduce result (or a better document structure or M/R)

Posted by Filippo Fadda <fi...@programmazione.it>.
So, the short answer is: you can't. The long answer is that I was figuring the same problem and I have found the following solution.

Every time a user vote, the application saves a new document having doc.type == 'vote'. The 'vote' document has a property called 'recorded', that is really important.

The application uses a design document called 'votes', where is stored a view called 'notRecorded', that contains just a map function like this one:

              function($doc) {
                if ((doc.type == 'vote') and (!doc.recorded))
                  emit(doc.timestamp, doc);
              }

As you can see, there is not a reduce function, since we need to mark those documents as recorded, we'll reduce them later on our own.

There is a cron job that runs, in my case, a PHP script. You can choose to run the script every minute, every five minutes, etc. The script queries the view defined above to get all the 'not recorded' votes.
Once we have all the 'not recorded' votes, we use a simple function to reduce them. In my case the function reduces them by postId. The function returns an associative array with the sum of the 'not recorded' votes, using as keys the post identifiers.
Finally the same script queries another view of another design document scores/perPost, using as keys the same post identifiers. This is the map function:

              function(doc)) {
                if (doc.type == 'score')
                  emit(doc.postId, doc);
              }

In the end, the script updates every single document of type 'score' retrieved before, adding to the score.total the total of new voted counted. Then the script updates every single 'vote' document marking it as recorded. So we'll never count them anymore. There is no race condition here, no conflicts at all.

To get the posts ordered by score you just need to query another view, let's call it posts/byScore (ddoc/viewName) like this:

              function(doc)) {
                if (doc.type == 'score')
                  emit(score.total, doc.postId);
              }

I suggest to use _bulk_docs API, within all_or_nothing option, to be sure to change all the documents as an atomic operation.

I don't know if you are using PHP, but my library has some interesting features. First of all, you can process a query result chunk by chunk (I use this approach to reduce votes), and you can also returns null rows in case some post doesn't still have a score, because you never registered votes. This simplify a lot your job.

This solution counts just the new votes. In a new vote is generated in the meantime, nothing bad is gonna happen, no conflict is generated. You just operate on a strict set of posts, just the one have new votes. You'll just change the score related to those documents. That's all. :-)

So 3 views and 3 document types: post, vote, score. In your case you'll have user instead of post, but the mechanic remains the same.

Bye

-Filippo

On Oct 11, 2013, at 5:09 PM, Tom van Neerijnen wrote:

> Hi all
> 
> I'm developing a ranking service I'd like to throw scores for multiple
> users at and later be able to retrieve a list of users ranked/ordered by
> their scores. Problem is if I make it time series data and accumulate the
> total score in a reduce I can't put the score in the key to sort by.
> 
> Does anyone have suggestions as to how I could re-jig my documents and/or
> M/R to accommodate this kind of output or should I simply consider storing
> a total document along side all my time series documents? My concern with
> storing a total document is I can never be sure the total in the document
> exactly matches my M/R over my time series data because I'd have to have an
> app query the view and, using the result of the view query, update the
> total document, so that's a last resort.
> 
> I considered sorting in a list function but we're potentially looking at
> millions of users so that's not really an option.
> 
> I'm imagining time series documents like so:
> "abcdefghi" : {
>  "userId": "jim",
>  "scoreChange": 10,
>  "country": "GB",
>  "game": "solitaire",
>  "date": "2013-10-08T13:50:19"
> }
> 
> "jklmnopqr" : {
>  "userId": "jim",
>  "scoreChange": -5,
>  "country": "GB",
>  "game": "solitaire",
>  "date": "2013-10-08T13:51:19"
> }
> 
> "jklmnopqr" : {
>  "userId": "james",
>  "scoreChange": 4,
>  "country": "GB",
>  "game": "freecell",
>  "date": "2013-10-08T13:52:19"
> }
> 
> I could then have this M/R for, as an example, ranks by game then user:
> "by_game": {
>    "map": "function(doc) { emit([doc.game, doc.userId], doc.scoreChange)}",
>    "reduce": "function(keys, values, rereduce) { return sum(values); }"
> }
> 
> and get this list from it:
> {"key":["freecell","james"],"value":4}
> {"key":["solitaire","jim"],"value":5},
> 
> Thanks