You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Douglas Fils <fi...@iastate.edu> on 2009/05/28 23:20:06 UTC

find all unique field names

Forgive the noob question..  but I've not been able to easily locate an 
approach today to getting a return that gives all the unique field names 
in a couch database.

It's not too hard to generate a map function that emits an array of the 
field names in a particular record....
(please note this is about as much JS as I have ever written)  :)
function(doc) {
   var i = 0;
   var keyNames = new Array();
   for (var key in doc) {
     keyNames[i] = key
     i++;
   }
   emit(null,keyNames);
}

However, once I pass that over to the reduce (assuming this is even the 
way to do it) I don't see an easy way to get the unique intersection of 
the various field names.

Any help would be appreciated...
Thanks
Doug


Re: find all unique field names

Posted by Brian Candler <B....@pobox.com>.
On Thu, May 28, 2009 at 04:20:06PM -0500, Douglas Fils wrote:
> It's not too hard to generate a map function that emits an array of the  
> field names in a particular record....
> (please note this is about as much JS as I have ever written)  :)
> function(doc) {
>   var i = 0;
>   var keyNames = new Array();
>   for (var key in doc) {
>     keyNames[i] = key
>     i++;
>   }
>   emit(null,keyNames);
> }
>
> However, once I pass that over to the reduce (assuming this is even the  
> way to do it) I don't see an easy way to get the unique intersection of  
> the various field names.

Try just emiting the field names like this

function(doc) {
  for (var key in doc) {
    emit(key,null);
  }
}

Then the following reduce function will build a map of {fieldname: count}

function(ks, vs, co) {
  if (co) {
    var result = vs.shift();
    for (var i in vs) {
      for (var j in vs[i]) {
        result[j] = (result[j] || 0) + vs[i][j];
      }
    }
    return result;
  } else {
    var result = {};
    for (var i in ks) {
      var key = ks[i];
      result[key[0]] = (result[key[0]] || 0) + 1;
    }
    return result;
  }
}

Then the client just asks for the reduce value, and looks at the distinct
keys.

Alternatively, you can use a simple counter reduce function and a group=true
query.

The former approach more efficient if the number of distinct values is
relatively small, since a single disk access will get all the keys. The
latter approach involves walking the btree index, but avoids the problems
with building a large reduce object if the number of distinct values is
large.

HTH,

Brian.

Re: find all unique field names

Posted by Blair Nilsson <bl...@gmail.com>.
On Fri, May 29, 2009 at 1:09 PM, Chris Anderson <jc...@apache.org> wrote:
> On Thu, May 28, 2009 at 4:26 PM, Blair Nilsson <bl...@gmail.com> wrote:
>> On Fri, May 29, 2009 at 10:25 AM, Blair Nilsson <bl...@gmail.com> wrote:
>>> On Fri, May 29, 2009 at 9:20 AM, Douglas Fils <fi...@iastate.edu> wrote:
>>>> Forgive the noob question..  but I've not been able to easily locate an
>>>> approach today to getting a return that gives all the unique field names in
>>>> a couch database.
>>>>
>>>> It's not too hard to generate a map function that emits an array of the
>>>> field names in a particular record....
>>>> (please note this is about as much JS as I have ever written)  :)
>>>> function(doc) {
>>>>  var i = 0;
>>>>  var keyNames = new Array();
>>>>  for (var key in doc) {
>>>>    keyNames[i] = key
>>>>    i++;
>>>>  }
>>>>  emit(null,keyNames);
>>>> }
>>>>
>>>> However, once I pass that over to the reduce (assuming this is even the way
>>>> to do it) I don't see an easy way to get the unique intersection of the
>>>> various field names.
>>>>
>>>> Any help would be appreciated...
>>>> Thanks
>>>> Doug
>>>>
>>>>
>>>
>>> maybe the map should be
>>>
>>> function(doc) {
>>>  for (var key in doc) {
>>>   emit(key,"")
>>>  }
>>> }
>>>
>>> and the reduce
>>> function(keys,values) {
>>>  return null;
>>> }
>>>
>>> and just use the returned keys as the field names.
>>>
>>> --- Blair
>>>
>>
>> Actually this can be a good demonstration on the reduce function.
>>
>> Say we were tying to solve a sightly more complicated version of this,
>> one were we were trying to get the number of times the field name is
>> used.
>>
>> We want the results by field name, so we use that as our key, and we
>> use 1 as our value, since for each emit, we have 1 use of that field
>> name.
>>
>> function(doc) {
>>  for (var key in doc) {
>>   emit(key,1)
>>  }
>> }
>>
>> which would give us
>>
>> address : 1
>> city : 1
>> city : 1
>> city : 1
>> name : 1
>> name : 1
>>
>> etc...
>>
>> by putting in a reduce function, even if it didn't really do anything,
>> the results will get stacked together by key...
>>
>> function(keys, values) {
>>  return values;
>> }
>>
>> would give us....
>>
>> address : [1]
>> city : [1,1,1]
>> name : [1,1]
>>
>
> Good examples overall, Blair. Thanks for the explanation. The one
> nitpick I'm compelled to point out is that once should never have a
> reduce function that just returns the values. The above list will end
> up with:
>
> address : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> ...
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>
> and thats with only 100 documents. You can imagine how hard it will be
> to manage this array when you are dealing with thousands or millions
> of rows.
>
> But on the whole, you are correct, and the sum() helper you point out
> is the way to do the advanced query.
>
> I should note that these examples assume you query the reduce with
> group=true (which is the default query option used by Futon).
>
>> The values for each key are stacked together in an array, all good, we
>> can work with that...
>> Since we want to add them all together, we could step through each one
>> adding them up...
>>
>> function(keys, values) {
>>  var total = 0
>>  for (var k in values) {
>>    total = total + values[k]
>>  }
>>  return total;
>> }
>>
>> or more simply...
>>
>> function(keys, values) {
>>  return sum(values)
>> }
>>
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Agreed, it was an intermediate step to show how the reduce function
worked. I should do an example showing re-reduce. Maybe the mailing
list isn't the right place though. Maybe its time I actually started
blogging :)

BTW, the couch.io hosting service is going to be so very useful, I'll
be a paying customer for that soon enough.

Re: find all unique field names

Posted by Chris Anderson <jc...@apache.org>.
On Thu, May 28, 2009 at 4:26 PM, Blair Nilsson <bl...@gmail.com> wrote:
> On Fri, May 29, 2009 at 10:25 AM, Blair Nilsson <bl...@gmail.com> wrote:
>> On Fri, May 29, 2009 at 9:20 AM, Douglas Fils <fi...@iastate.edu> wrote:
>>> Forgive the noob question..  but I've not been able to easily locate an
>>> approach today to getting a return that gives all the unique field names in
>>> a couch database.
>>>
>>> It's not too hard to generate a map function that emits an array of the
>>> field names in a particular record....
>>> (please note this is about as much JS as I have ever written)  :)
>>> function(doc) {
>>>  var i = 0;
>>>  var keyNames = new Array();
>>>  for (var key in doc) {
>>>    keyNames[i] = key
>>>    i++;
>>>  }
>>>  emit(null,keyNames);
>>> }
>>>
>>> However, once I pass that over to the reduce (assuming this is even the way
>>> to do it) I don't see an easy way to get the unique intersection of the
>>> various field names.
>>>
>>> Any help would be appreciated...
>>> Thanks
>>> Doug
>>>
>>>
>>
>> maybe the map should be
>>
>> function(doc) {
>>  for (var key in doc) {
>>   emit(key,"")
>>  }
>> }
>>
>> and the reduce
>> function(keys,values) {
>>  return null;
>> }
>>
>> and just use the returned keys as the field names.
>>
>> --- Blair
>>
>
> Actually this can be a good demonstration on the reduce function.
>
> Say we were tying to solve a sightly more complicated version of this,
> one were we were trying to get the number of times the field name is
> used.
>
> We want the results by field name, so we use that as our key, and we
> use 1 as our value, since for each emit, we have 1 use of that field
> name.
>
> function(doc) {
>  for (var key in doc) {
>   emit(key,1)
>  }
> }
>
> which would give us
>
> address : 1
> city : 1
> city : 1
> city : 1
> name : 1
> name : 1
>
> etc...
>
> by putting in a reduce function, even if it didn't really do anything,
> the results will get stacked together by key...
>
> function(keys, values) {
>  return values;
> }
>
> would give us....
>
> address : [1]
> city : [1,1,1]
> name : [1,1]
>

Good examples overall, Blair. Thanks for the explanation. The one
nitpick I'm compelled to point out is that once should never have a
reduce function that just returns the values. The above list will end
up with:

address : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
...
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

and thats with only 100 documents. You can imagine how hard it will be
to manage this array when you are dealing with thousands or millions
of rows.

But on the whole, you are correct, and the sum() helper you point out
is the way to do the advanced query.

I should note that these examples assume you query the reduce with
group=true (which is the default query option used by Futon).

> The values for each key are stacked together in an array, all good, we
> can work with that...
> Since we want to add them all together, we could step through each one
> adding them up...
>
> function(keys, values) {
>  var total = 0
>  for (var k in values) {
>    total = total + values[k]
>  }
>  return total;
> }
>
> or more simply...
>
> function(keys, values) {
>  return sum(values)
> }
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: find all unique field names

Posted by Blair Nilsson <bl...@gmail.com>.
On Fri, May 29, 2009 at 10:25 AM, Blair Nilsson <bl...@gmail.com> wrote:
> On Fri, May 29, 2009 at 9:20 AM, Douglas Fils <fi...@iastate.edu> wrote:
>> Forgive the noob question..  but I've not been able to easily locate an
>> approach today to getting a return that gives all the unique field names in
>> a couch database.
>>
>> It's not too hard to generate a map function that emits an array of the
>> field names in a particular record....
>> (please note this is about as much JS as I have ever written)  :)
>> function(doc) {
>>  var i = 0;
>>  var keyNames = new Array();
>>  for (var key in doc) {
>>    keyNames[i] = key
>>    i++;
>>  }
>>  emit(null,keyNames);
>> }
>>
>> However, once I pass that over to the reduce (assuming this is even the way
>> to do it) I don't see an easy way to get the unique intersection of the
>> various field names.
>>
>> Any help would be appreciated...
>> Thanks
>> Doug
>>
>>
>
> maybe the map should be
>
> function(doc) {
>  for (var key in doc) {
>   emit(key,"")
>  }
> }
>
> and the reduce
> function(keys,values) {
>  return null;
> }
>
> and just use the returned keys as the field names.
>
> --- Blair
>

Actually this can be a good demonstration on the reduce function.

Say we were tying to solve a sightly more complicated version of this,
one were we were trying to get the number of times the field name is
used.

We want the results by field name, so we use that as our key, and we
use 1 as our value, since for each emit, we have 1 use of that field
name.

function(doc) {
 for (var key in doc) {
   emit(key,1)
 }
}

which would give us

address : 1
city : 1
city : 1
city : 1
name : 1
name : 1

etc...

by putting in a reduce function, even if it didn't really do anything,
the results will get stacked together by key...

function(keys, values) {
  return values;
}

would give us....

address : [1]
city : [1,1,1]
name : [1,1]

The values for each key are stacked together in an array, all good, we
can work with that...
Since we want to add them all together, we could step through each one
adding them up...

function(keys, values) {
  var total = 0
  for (var k in values) {
    total = total + values[k]
  }
  return total;
}

or more simply...

function(keys, values) {
  return sum(values)
}

Re: find all unique field names

Posted by Blair Nilsson <bl...@gmail.com>.
On Fri, May 29, 2009 at 9:20 AM, Douglas Fils <fi...@iastate.edu> wrote:
> Forgive the noob question..  but I've not been able to easily locate an
> approach today to getting a return that gives all the unique field names in
> a couch database.
>
> It's not too hard to generate a map function that emits an array of the
> field names in a particular record....
> (please note this is about as much JS as I have ever written)  :)
> function(doc) {
>  var i = 0;
>  var keyNames = new Array();
>  for (var key in doc) {
>    keyNames[i] = key
>    i++;
>  }
>  emit(null,keyNames);
> }
>
> However, once I pass that over to the reduce (assuming this is even the way
> to do it) I don't see an easy way to get the unique intersection of the
> various field names.
>
> Any help would be appreciated...
> Thanks
> Doug
>
>

maybe the map should be

function(doc) {
 for (var key in doc) {
   emit(key,"")
 }
}

and the reduce
function(keys,values) {
  return null;
}

and just use the returned keys as the field names.

--- Blair