You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mike Leddy <mi...@loop.com.br> on 2009/10/07 23:05:32 UTC

Documents with large numbers of fields

Hi all,

I have a database with approx 100 million records that I like to port
to a couchdb cluster. The documents that will be inserted will 
vary between 150-250 fields each.

(I will be doing some cool/complex stuff with views for recently 
inserted data in the 24-48 hour range)

A series of historical reports are generated from this data that 
are selected  using a simple criterion ie. an interval in time 
together with multiple locations BUT the fields that are used vary 
significantly depending on the reports (and combination of reports)
that are generated simultaneously.

If I were to use views for each report wouldn't I be building a 
index plus the resulting field values for each possible report
on disk ?

If I use only one view I will save disk but returning the full 
documents via HTTP which will be impractically slow :-(

Is there any way I can have one index being used but filter the 
fields being returned in the documents ? I guess conceptually its
single 'view' combined with different 'shows' that return JSON.

Have I missed some obvious way to do this ?

TIA

Mike









Re: Documents with large numbers of fields

Posted by Mike Leddy <mi...@loop.com.br>.
On Wed, 2009-10-07 at 22:26 -0400, Adam Kocoloski wrote:
> On Oct 7, 2009, at 5:05 PM, Mike Leddy wrote:
> 
> > Hi all,
> >
> > I have a database with approx 100 million records that I like to port
> > to a couchdb cluster. The documents that will be inserted will
> > vary between 150-250 fields each.
> >
> > (I will be doing some cool/complex stuff with views for recently
> > inserted data in the 24-48 hour range)
> >
> > A series of historical reports are generated from this data that
> > are selected  using a simple criterion ie. an interval in time
> > together with multiple locations BUT the fields that are used vary
> > significantly depending on the reports (and combination of reports)
> > that are generated simultaneously.
> >
> > If I were to use views for each report wouldn't I be building a
> > index plus the resulting field values for each possible report
> > on disk ?
> >
> > If I use only one view I will save disk but returning the full
> > documents via HTTP which will be impractically slow :-(
> >
> > Is there any way I can have one index being used but filter the
> > fields being returned in the documents ? I guess conceptually its
> > single 'view' combined with different 'shows' that return JSON.
> >
> > Have I missed some obvious way to do this ?
> >
> > TIA
> >
> > Mike
> 
> Hi Mike, I think you're right, the way to accomplish this is to write  
> multiple 'list' functions that send different subsets of the value  
> stored in your view.  The API for _list is quite different in 0.9 and  
> 0.10, see
> 
> http://wiki.apache.org/couchdb/Formatting_with_Show_and_List
> 
> Best, Adam
> 

Thanks Adam, I didn't know about the _list API. I upgraded to 0.10 and
wrote one 'list' function that receives the field list in the request
query string and picks out the fields from the doc (assuming that
include_docs=true is also used).

Now I can use this one 'list' function with the 'view' and pick out
any document fields I like.

Problem solved :-)

Thanks,

Mike


Re: Documents with large numbers of fields

Posted by Adam Kocoloski <ko...@apache.org>.
On Oct 7, 2009, at 5:05 PM, Mike Leddy wrote:

> Hi all,
>
> I have a database with approx 100 million records that I like to port
> to a couchdb cluster. The documents that will be inserted will
> vary between 150-250 fields each.
>
> (I will be doing some cool/complex stuff with views for recently
> inserted data in the 24-48 hour range)
>
> A series of historical reports are generated from this data that
> are selected  using a simple criterion ie. an interval in time
> together with multiple locations BUT the fields that are used vary
> significantly depending on the reports (and combination of reports)
> that are generated simultaneously.
>
> If I were to use views for each report wouldn't I be building a
> index plus the resulting field values for each possible report
> on disk ?
>
> If I use only one view I will save disk but returning the full
> documents via HTTP which will be impractically slow :-(
>
> Is there any way I can have one index being used but filter the
> fields being returned in the documents ? I guess conceptually its
> single 'view' combined with different 'shows' that return JSON.
>
> Have I missed some obvious way to do this ?
>
> TIA
>
> Mike

Hi Mike, I think you're right, the way to accomplish this is to write  
multiple 'list' functions that send different subsets of the value  
stored in your view.  The API for _list is quite different in 0.9 and  
0.10, see

http://wiki.apache.org/couchdb/Formatting_with_Show_and_List

Best, Adam