You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Brian Candler <B....@pobox.com> on 2009/06/02 11:10:13 UTC

Re: find all unique field names

On Thu, May 28, 2009 at 04:20:06PM -0500, Douglas Fils wrote:
> It's not too hard to generate a map function that emits an array of the  
> field names in a particular record....
> (please note this is about as much JS as I have ever written)  :)
> function(doc) {
>   var i = 0;
>   var keyNames = new Array();
>   for (var key in doc) {
>     keyNames[i] = key
>     i++;
>   }
>   emit(null,keyNames);
> }
>
> However, once I pass that over to the reduce (assuming this is even the  
> way to do it) I don't see an easy way to get the unique intersection of  
> the various field names.

Try just emiting the field names like this

function(doc) {
  for (var key in doc) {
    emit(key,null);
  }
}

Then the following reduce function will build a map of {fieldname: count}

function(ks, vs, co) {
  if (co) {
    var result = vs.shift();
    for (var i in vs) {
      for (var j in vs[i]) {
        result[j] = (result[j] || 0) + vs[i][j];
      }
    }
    return result;
  } else {
    var result = {};
    for (var i in ks) {
      var key = ks[i];
      result[key[0]] = (result[key[0]] || 0) + 1;
    }
    return result;
  }
}

Then the client just asks for the reduce value, and looks at the distinct
keys.

Alternatively, you can use a simple counter reduce function and a group=true
query.

The former approach more efficient if the number of distinct values is
relatively small, since a single disk access will get all the keys. The
latter approach involves walking the btree index, but avoids the problems
with building a large reduce object if the number of distinct values is
large.

HTH,

Brian.