You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Chris Stockton <ch...@gmail.com> on 2010/04/13 19:21:59 UTC

Views - Finding the column count for a given "table"

Hello,

We have created a data-store for users, one big difference with this
table is that a specific document may not have the column defined in
its "data" object. This means as users add new columns we do not need
to back fill all the rows of the "table" with empty data. Couch is
working very well for us for this reason but I have become stumbled
with getting the counts of rows which contain columns for a given
"table".

Given a set of documents that look like:
{"table":"table1","data":{"a":1,"b":1,"c":1}} -> Table row, with a
column A, B and C. This is the tables defined full schema.
{"table":"table1","data":{"a":1,"c":1}} -> Here the user never put data into B.
{"table":"table1","data":{"a":1}}
{"table":"table1","data":{"c":1}}  -> Table row, with only one column,
C, the other 2 columns are NULL.
{"table":"table2","data":{"i":1}}
{"table":"table2","data":{"i":1,"o":1,"p":1}}
{"table":"table2","data":{"i":1,"o":1,"p":1}}

I need to turn them into a view which gives counts of occurrence
grouped by table and each data key, So I may look them up giving them
the table & column. Here is what I have currently:
{"rows":[
{"key":["table1","a"],"value":3},
{"key":["table1","b"],"value":1},
{"key":["table1","c"],"value":3},
{"key":["table2","i"],"value":3},
{"key":["table2","o"],"value":2},
{"key":["table2","p"],"value":2}
]}

The following MapReduce works fine on the above data set:
function(doc) {
  for(key in doc.data) {
    emit([doc.table, key], null);
  }
}

function(keys, values, rereduce) {
  if(rereduce) {
    return sum(values);
  } else {
    return values.length;
  }
}

However, when I put this view into tables with hundreds of rows I
start getting unexpected (well maybe expected, but not understood)
results. Mostly I just end up with many columns (I.E. a, b, c) being
the MAX count of all rows in the table. Any help getting my head
around thinking about this in the correct way would be much
appreciated.

-Chris

Re: Views - Finding the column count for a given "table"

Posted by Chris Stockton <ch...@gmail.com>.
Hello,

On Tue, Apr 13, 2010 at 10:28 AM, J Chris Anderson <jc...@gmail.com> wrote:
>
> This looks right to me.
>
> You might try replacing the reduce function with the simple text:
>
> _count
>
> which will do the equivalent reduce in Erlang (which will be much faster). Can you see if that helps?

Thank you much for your response, your confirmation that my reduce
looked correct led me to check other avenues and solve my issue.

The getCountByTableColumn view works hand and hand with
getByTableColumnValues for sorting. Turns out my substr call was
sometimes being called on NULL in the getByTableColumnValues. This
would raise a exception and fail to index those rows. This happens
very rarely as we do not usually have null rows, since
getByTableColumnValues returned a different number of rows then my
getCountByTableColumn call it made me think the counting was
incorrect. Fun times!

Kind Regards,

-Chris

Re: Views - Finding the column count for a given "table"

Posted by J Chris Anderson <jc...@gmail.com>.
On Apr 13, 2010, at 10:21 AM, Chris Stockton wrote:

> Hello,
> 
> We have created a data-store for users, one big difference with this
> table is that a specific document may not have the column defined in
> its "data" object. This means as users add new columns we do not need
> to back fill all the rows of the "table" with empty data. Couch is
> working very well for us for this reason but I have become stumbled
> with getting the counts of rows which contain columns for a given
> "table".
> 
> Given a set of documents that look like:
> {"table":"table1","data":{"a":1,"b":1,"c":1}} -> Table row, with a
> column A, B and C. This is the tables defined full schema.
> {"table":"table1","data":{"a":1,"c":1}} -> Here the user never put data into B.
> {"table":"table1","data":{"a":1}}
> {"table":"table1","data":{"c":1}}  -> Table row, with only one column,
> C, the other 2 columns are NULL.
> {"table":"table2","data":{"i":1}}
> {"table":"table2","data":{"i":1,"o":1,"p":1}}
> {"table":"table2","data":{"i":1,"o":1,"p":1}}
> 
> I need to turn them into a view which gives counts of occurrence
> grouped by table and each data key, So I may look them up giving them
> the table & column. Here is what I have currently:
> {"rows":[
> {"key":["table1","a"],"value":3},
> {"key":["table1","b"],"value":1},
> {"key":["table1","c"],"value":3},
> {"key":["table2","i"],"value":3},
> {"key":["table2","o"],"value":2},
> {"key":["table2","p"],"value":2}
> ]}
> 
> The following MapReduce works fine on the above data set:
> function(doc) {
>  for(key in doc.data) {
>    emit([doc.table, key], null);
>  }
> }
> 
> function(keys, values, rereduce) {
>  if(rereduce) {
>    return sum(values);
>  } else {
>    return values.length;
>  }
> }

This looks right to me.

You might try replacing the reduce function with the simple text:

_count

which will do the equivalent reduce in Erlang (which will be much faster). Can you see if that helps?

> 
> However, when I put this view into tables with hundreds of rows I
> start getting unexpected (well maybe expected, but not understood)
> results. Mostly I just end up with many columns (I.E. a, b, c) being
> the MAX count of all rows in the table. Any help getting my head
> around thinking about this in the correct way would be much
> appreciated.
> 
> -Chris