You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Ryan Richins <ri...@mac.com> on 2009/10/26 21:24:39 UTC

Size of view file

I am working on a project where i have 12k documents and the size of  
the db is 11MB but the view file is is over 4GB.  Obviously I am doing  
something wrong with my views to make the file so large.  I was hoping  
to get some input as to where my problem might be.

Running couchdb 0.90

Each document has 3 attributes one of which is 'User Agent'.  For each  
attribute I have the following views defined  
"by_<attribute>_total_date" and "by_<attribute> _created_at".  Below  
is the code for the 2 views that deal with User Agent.  The same code  
is used to define the views for the other 2 attributes except  
doc.user_agent is replace by doc.<attribute>

My guess is the problem lies somewhere in the  
"by_<attribute>_total_date" since every other view I have returns NULL  
for the value.

#1 by_ua_total_date
-----------------
MAP:
function(doc) {
         var val = {};
         datetime = doc.created_at;
         year = parseInt(datetime.substr(0, 4));
         month = parseInt(datetime.substr(5, 2), 10);
         day = parseInt(datetime.substr(8, 2), 10);
         val[doc.user_agent] = 1;
         emit([year, month, day], val );
       }

REDUCE:
function (keys, values, rereduce) {
         var rv = {};
         for (i in values) {
           var value = values[i];
           for (k in value) {
             rv[k] = (rv[k] || 0) + value[k];
           }
         }
         return rv;
       }	


EXAMPLE OUTPUT (Key, Value)
[2009, 9, 6], {Mozilla/5.0 (iPod; U; CPU iPhone OS 2_2_1 like Mac OS  
X; en-us) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1  
Mobile/5H11a Safari/525.20: 5, Mozilla/4.0 (compatible; MSIE 6.0;  
Windows NT 5.1; SV1; FunWebProducts; InfoPath.2; .NET CLR 2.0.50727;  
OfficeLiveConnector.1.3; OfficeLivePatch.0.0): 2, Mozilla/5.0 (iPod;  
U; CPU iPhone OS 2_2 like Mac OS X; en-us) AppleWebKit/525.18.1  
(KHTML, like Gecko) Version/3.1.1 Mobile/5G77a Safari/525.20: 2,  
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; YPC  
3.2.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR  
3.0.04506.30; .NET CLR 3.0.04506.648; InfoPath.2): 1 }
-----------------

#2 by_ua_created_at
----------------
MAP:
function(doc) {
     emit([doc['user_agent'], doc['created_at']], null);
}
EXAMPLE OUTPUT (Key, Value)
["8900a/1.2 Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile  
7.6)", "2009/10/11 13:02:46 +0000"], NULL
----------------



Going through Fulton to view my data, it does not seem it should be  
4GB worth but I am missing something.  Any insight would be very much  
appreciated.

Thanks,

Ryan




RE: Size of view file

Posted by Kevin Ferguson <ke...@meebo-inc.com>.
Also I meant:
emit([year, month, day, doc.user_agent], 1);
of course ;)

Kevin
________________________________________
From: Paul Davis [paul.joseph.davis@gmail.com]
Sent: Monday, October 26, 2009 1:38 PM
To: user@couchdb.apache.org
Subject: Re: Size of view file

Ryan,

Kevin's got it right here. Try his view to see what size you get. My
guess is that UA's are different enough that you would've be getting
reduciness errors on newer versions.

Paul Davis

This might be helpful:

http://mail-archives.apache.org/mod_mbox/couchdb-user/200904.mbox/%3Ce2111bbb0904090817j7ba10b34pe53ce2fd3c5c1590@mail.gmail.com%3E

Re: Size of view file

Posted by Paul Davis <pa...@gmail.com>.
Ryan,

Kevin's got it right here. Try his view to see what size you get. My
guess is that UA's are different enough that you would've be getting
reduciness errors on newer versions.

Paul Davis

This might be helpful:

http://mail-archives.apache.org/mod_mbox/couchdb-user/200904.mbox/%3Ce2111bbb0904090817j7ba10b34pe53ce2fd3c5c1590@mail.gmail.com%3E

On Mon, Oct 26, 2009 at 4:29 PM, Kevin Ferguson <ke...@meebo-inc.com> wrote:
> For #1, have you considered a view like:
>
> MAP:
> function(doc) {
>         datetime = doc.created_at;
>         year = parseInt(datetime.substr(0, 4));
>         month = parseInt(datetime.substr(5, 2), 10);
>         day = parseInt(datetime.substr(8, 2), 10);
>         emit([year, month, day, doc.user_agent], val );
>       }
> REDUCE:
> function(k,v,r) { return sum(v); }
>
> Then you can query with startkey=[y,m,d], endkey=[y,m,d,{}], group=true and get the count for each user-agent on that day.  I think the output will be smaller too, but I don't know a whole lot about the view engine internals.
>
> Kevin
>
> ________________________________________
> From: Ryan Richins [richinsr@mac.com]
> Sent: Monday, October 26, 2009 1:24 PM
> To: user@couchdb.apache.org
> Subject: Size of view file
>
> I am working on a project where i have 12k documents and the size of
> the db is 11MB but the view file is is over 4GB.  Obviously I am doing
> something wrong with my views to make the file so large.  I was hoping
> to get some input as to where my problem might be.
>
> Running couchdb 0.90
>
> Each document has 3 attributes one of which is 'User Agent'.  For each
> attribute I have the following views defined
> "by_<attribute>_total_date" and "by_<attribute> _created_at".  Below
> is the code for the 2 views that deal with User Agent.  The same code
> is used to define the views for the other 2 attributes except
> doc.user_agent is replace by doc.<attribute>
>
> My guess is the problem lies somewhere in the
> "by_<attribute>_total_date" since every other view I have returns NULL
> for the value.
>
> #1 by_ua_total_date
> -----------------
> MAP:
> function(doc) {
>         var val = {};
>         datetime = doc.created_at;
>         year = parseInt(datetime.substr(0, 4));
>         month = parseInt(datetime.substr(5, 2), 10);
>         day = parseInt(datetime.substr(8, 2), 10);
>         val[doc.user_agent] = 1;
>         emit([year, month, day], val );
>       }
>
> REDUCE:
> function (keys, values, rereduce) {
>         var rv = {};
>         for (i in values) {
>           var value = values[i];
>           for (k in value) {
>             rv[k] = (rv[k] || 0) + value[k];
>           }
>         }
>         return rv;
>       }
>
>
> EXAMPLE OUTPUT (Key, Value)
> [2009, 9, 6], {Mozilla/5.0 (iPod; U; CPU iPhone OS 2_2_1 like Mac OS
> X; en-us) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1
> Mobile/5H11a Safari/525.20: 5, Mozilla/4.0 (compatible; MSIE 6.0;
> Windows NT 5.1; SV1; FunWebProducts; InfoPath.2; .NET CLR 2.0.50727;
> OfficeLiveConnector.1.3; OfficeLivePatch.0.0): 2, Mozilla/5.0 (iPod;
> U; CPU iPhone OS 2_2 like Mac OS X; en-us) AppleWebKit/525.18.1
> (KHTML, like Gecko) Version/3.1.1 Mobile/5G77a Safari/525.20: 2,
> Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; YPC
> 3.2.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR
> 3.0.04506.30; .NET CLR 3.0.04506.648; InfoPath.2): 1 }
> -----------------
>
> #2 by_ua_created_at
> ----------------
> MAP:
> function(doc) {
>     emit([doc['user_agent'], doc['created_at']], null);
> }
> EXAMPLE OUTPUT (Key, Value)
> ["8900a/1.2 Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile
> 7.6)", "2009/10/11 13:02:46 +0000"], NULL
> ----------------
>
>
>
> Going through Fulton to view my data, it does not seem it should be
> 4GB worth but I am missing something.  Any insight would be very much
> appreciated.
>
> Thanks,
>
> Ryan
>
>
>
>

RE: Size of view file

Posted by Kevin Ferguson <ke...@meebo-inc.com>.
For #1, have you considered a view like:

MAP:
function(doc) {
         datetime = doc.created_at;
         year = parseInt(datetime.substr(0, 4));
         month = parseInt(datetime.substr(5, 2), 10);
         day = parseInt(datetime.substr(8, 2), 10);
         emit([year, month, day, doc.user_agent], val );
       }
REDUCE:
function(k,v,r) { return sum(v); }

Then you can query with startkey=[y,m,d], endkey=[y,m,d,{}], group=true and get the count for each user-agent on that day.  I think the output will be smaller too, but I don't know a whole lot about the view engine internals.

Kevin

________________________________________
From: Ryan Richins [richinsr@mac.com]
Sent: Monday, October 26, 2009 1:24 PM
To: user@couchdb.apache.org
Subject: Size of view file

I am working on a project where i have 12k documents and the size of
the db is 11MB but the view file is is over 4GB.  Obviously I am doing
something wrong with my views to make the file so large.  I was hoping
to get some input as to where my problem might be.

Running couchdb 0.90

Each document has 3 attributes one of which is 'User Agent'.  For each
attribute I have the following views defined
"by_<attribute>_total_date" and "by_<attribute> _created_at".  Below
is the code for the 2 views that deal with User Agent.  The same code
is used to define the views for the other 2 attributes except
doc.user_agent is replace by doc.<attribute>

My guess is the problem lies somewhere in the
"by_<attribute>_total_date" since every other view I have returns NULL
for the value.

#1 by_ua_total_date
-----------------
MAP:
function(doc) {
         var val = {};
         datetime = doc.created_at;
         year = parseInt(datetime.substr(0, 4));
         month = parseInt(datetime.substr(5, 2), 10);
         day = parseInt(datetime.substr(8, 2), 10);
         val[doc.user_agent] = 1;
         emit([year, month, day], val );
       }

REDUCE:
function (keys, values, rereduce) {
         var rv = {};
         for (i in values) {
           var value = values[i];
           for (k in value) {
             rv[k] = (rv[k] || 0) + value[k];
           }
         }
         return rv;
       }


EXAMPLE OUTPUT (Key, Value)
[2009, 9, 6], {Mozilla/5.0 (iPod; U; CPU iPhone OS 2_2_1 like Mac OS
X; en-us) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1
Mobile/5H11a Safari/525.20: 5, Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; FunWebProducts; InfoPath.2; .NET CLR 2.0.50727;
OfficeLiveConnector.1.3; OfficeLivePatch.0.0): 2, Mozilla/5.0 (iPod;
U; CPU iPhone OS 2_2 like Mac OS X; en-us) AppleWebKit/525.18.1
(KHTML, like Gecko) Version/3.1.1 Mobile/5G77a Safari/525.20: 2,
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; YPC
3.2.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR
3.0.04506.30; .NET CLR 3.0.04506.648; InfoPath.2): 1 }
-----------------

#2 by_ua_created_at
----------------
MAP:
function(doc) {
     emit([doc['user_agent'], doc['created_at']], null);
}
EXAMPLE OUTPUT (Key, Value)
["8900a/1.2 Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile
7.6)", "2009/10/11 13:02:46 +0000"], NULL
----------------



Going through Fulton to view my data, it does not seem it should be
4GB worth but I am missing something.  Any insight would be very much
appreciated.

Thanks,

Ryan