You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Ajay Pawaskar <AP...@genesisinfo.com> on 2017/06/14 10:13:21 UTC

Store meta data of millions of images for search purpose

Hi,
I am having question related to storing millions/billions of document in Couch DB and use view to get required documents. I would like to know about performance/scalability of views/ Couch DB in following case.

I have  application where I need to store meta data of millions/billions of images for search purpose. Images will be added/updated/deleted/retrieve on regular basis [10000/20000 per day].

we are thinking to store these documents in following format
e.g.
{
   "_id": "201700000000002", /* this will be generated by our application*/
   "_rev": "1-b85e805bdd293a5f727517beea9512b3",
   "12398712397129": {"bCurrent": true, "bCanView": true} /*"12398712397129" is image file name*/
   "98127397192319": {"bCurrent": false, "bCanView": false}} /*"98127397192319" is image file name*/

}

{
   "_id": "201700000000003", /* this will be generated by our application*/
   "_rev": "1-b85e83432d293a5f727517beea9512b3",
   "89723979823929": {"bCurrent": true, "bCanView": true} /*"12398712397129" is image file name*/
   "92347324667324": {"bCurrent": false, "bCanView": false}} /*"98127397192319" is image file name*/
   "72832532467217": {"bCurrent": true, "bCanView": false}} /*"72832532467217" is image file name*/
}


so if user want to get current image for record  201700000000002 we will be having following view

function(doc) {
for(var prop in doc){
if(prop!="_id" && prop!="_rev"){
  if(doc[prop].bCurrent!==undefined && doc[prop].bCurrent){
     emit(doc._id, { RecordID: doc._id,ImageID: prop,bCurrent: doc[prop].bCurrent, doc[prop].bCanView});
}
}
}
which will be called with key "201700000000002".

but as mentioned earlier Images will be added/updated/deleted/retrieve on regular basis [10000/20000 per day] how this going to affect views performance?

Regards,
Ajay.

Re: Store meta data of millions of images for search purpose

Posted by Adam Kocoloski <ko...@apache.org>.
Hi Ajay, the view engine will happily keep up with 10k - 20k updates per day. If you’re using CouchDB 2.0 you can distribute this database across several underlying physical shards. You won’t need to do that just to keep up with your designed update rate, but an index with a billion entries will be easier to manage operationally if it’s sharded. Compaction in particular can be an unwieldy operation on an index that large. Cheers,

Adam

> On Jun 14, 2017, at 8:47 AM, Ajay Pawaskar <AP...@genesisinfo.com> wrote:
> 
> as per application there will be multiple images per record[201700000000002...]. images can be of different types [current, viewable] like one image is marked as bCurrent=true and another with bCurrent=false. then there will be search where I need to search image related to record which have bCurrent=true/false. if I make documents per image then number of documents will be increased [more than billions]
> 
> -----Original Message-----
> From: aa mm [mailto:assaf.morami@gmail.com] 
> Sent: Wednesday, June 14, 2017 6:11 PM
> To: user@couchdb.apache.org
> Subject: Re: Store meta data of millions of images for search purpose
> 
> What each document represents? Why do you need to generate ids? Is this a requirement?
> 
> If not, and image file name is unique, then you can make it so each document represents an image. _id will be the image file name, and thus you won't need a view to access an image, you'll need only the image name.
> 
> Assaf.
> 
> 
> בתאריך 14 ביוני 2017 01:13 PM,‏ "Ajay Pawaskar" <AP...@genesisinfo.com>
> כתב:
> 
> Hi,
> I am having question related to storing millions/billions of document in Couch DB and use view to get required documents. I would like to know about performance/scalability of views/ Couch DB in following case.
> 
> I have  application where I need to store meta data of millions/billions of images for search purpose. Images will be added/updated/deleted/retrieve on regular basis [10000/20000 per day].
> 
> we are thinking to store these documents in following format e.g.
> {
>   "_id": "201700000000002", /* this will be generated by our application*/
>   "_rev": "1-b85e805bdd293a5f727517beea9512b3",
>   "12398712397129": {"bCurrent": true, "bCanView": true} /*"12398712397129" is image file name*/
>   "98127397192319": {"bCurrent": false, "bCanView": false}} /*"98127397192319" is image file name*/
> 
> }
> 
> {
>   "_id": "201700000000003", /* this will be generated by our application*/
>   "_rev": "1-b85e83432d293a5f727517beea9512b3",
>   "89723979823929": {"bCurrent": true, "bCanView": true} /*"12398712397129" is image file name*/
>   "92347324667324": {"bCurrent": false, "bCanView": false}} /*"98127397192319" is image file name*/
>   "72832532467217": {"bCurrent": true, "bCanView": false}} /*"72832532467217" is image file name*/ }
> 
> 
> so if user want to get current image for record  201700000000002 we will be having following view
> 
> function(doc) {
> for(var prop in doc){
> if(prop!="_id" && prop!="_rev"){
>  if(doc[prop].bCurrent!==undefined && doc[prop].bCurrent){
>     emit(doc._id, { RecordID: doc._id,ImageID: prop,bCurrent:
> doc[prop].bCurrent, doc[prop].bCanView}); } } } which will be called with key "201700000000002".
> 
> but as mentioned earlier Images will be added/updated/deleted/retrieve on regular basis [10000/20000 per day] how this going to affect views performance?
> 
> Regards,
> Ajay.


RE: Store meta data of millions of images for search purpose

Posted by Ajay Pawaskar <AP...@genesisinfo.com>.
as per application there will be multiple images per record[201700000000002...]. images can be of different types [current, viewable] like one image is marked as bCurrent=true and another with bCurrent=false. then there will be search where I need to search image related to record which have bCurrent=true/false. if I make documents per image then number of documents will be increased [more than billions]

-----Original Message-----
From: aa mm [mailto:assaf.morami@gmail.com] 
Sent: Wednesday, June 14, 2017 6:11 PM
To: user@couchdb.apache.org
Subject: Re: Store meta data of millions of images for search purpose

What each document represents? Why do you need to generate ids? Is this a requirement?

If not, and image file name is unique, then you can make it so each document represents an image. _id will be the image file name, and thus you won't need a view to access an image, you'll need only the image name.

Assaf.


בתאריך 14 ביוני 2017 01:13 PM,‏ "Ajay Pawaskar" <AP...@genesisinfo.com>
כתב:

Hi,
I am having question related to storing millions/billions of document in Couch DB and use view to get required documents. I would like to know about performance/scalability of views/ Couch DB in following case.

I have  application where I need to store meta data of millions/billions of images for search purpose. Images will be added/updated/deleted/retrieve on regular basis [10000/20000 per day].

we are thinking to store these documents in following format e.g.
{
   "_id": "201700000000002", /* this will be generated by our application*/
   "_rev": "1-b85e805bdd293a5f727517beea9512b3",
   "12398712397129": {"bCurrent": true, "bCanView": true} /*"12398712397129" is image file name*/
   "98127397192319": {"bCurrent": false, "bCanView": false}} /*"98127397192319" is image file name*/

}

{
   "_id": "201700000000003", /* this will be generated by our application*/
   "_rev": "1-b85e83432d293a5f727517beea9512b3",
   "89723979823929": {"bCurrent": true, "bCanView": true} /*"12398712397129" is image file name*/
   "92347324667324": {"bCurrent": false, "bCanView": false}} /*"98127397192319" is image file name*/
   "72832532467217": {"bCurrent": true, "bCanView": false}} /*"72832532467217" is image file name*/ }


so if user want to get current image for record  201700000000002 we will be having following view

function(doc) {
for(var prop in doc){
if(prop!="_id" && prop!="_rev"){
  if(doc[prop].bCurrent!==undefined && doc[prop].bCurrent){
     emit(doc._id, { RecordID: doc._id,ImageID: prop,bCurrent:
doc[prop].bCurrent, doc[prop].bCanView}); } } } which will be called with key "201700000000002".

but as mentioned earlier Images will be added/updated/deleted/retrieve on regular basis [10000/20000 per day] how this going to affect views performance?

Regards,
Ajay.

Re: Store meta data of millions of images for search purpose

Posted by aa mm <as...@gmail.com>.
What each document represents? Why do you need to generate ids? Is this a
requirement?

If not, and image file name is unique, then you can make it so each
document represents an image. _id will be the image file name, and thus you
won't need a view to access an image, you'll need only the image name.

Assaf.


בתאריך 14 ביוני 2017 01:13 PM,‏ "Ajay Pawaskar" <AP...@genesisinfo.com>
כתב:

Hi,
I am having question related to storing millions/billions of document in
Couch DB and use view to get required documents. I would like to know about
performance/scalability of views/ Couch DB in following case.

I have  application where I need to store meta data of millions/billions of
images for search purpose. Images will be added/updated/deleted/retrieve on
regular basis [10000/20000 per day].

we are thinking to store these documents in following format
e.g.
{
   "_id": "201700000000002", /* this will be generated by our application*/
   "_rev": "1-b85e805bdd293a5f727517beea9512b3",
   "12398712397129": {"bCurrent": true, "bCanView": true}
/*"12398712397129" is image file name*/
   "98127397192319": {"bCurrent": false, "bCanView": false}}
/*"98127397192319" is image file name*/

}

{
   "_id": "201700000000003", /* this will be generated by our application*/
   "_rev": "1-b85e83432d293a5f727517beea9512b3",
   "89723979823929": {"bCurrent": true, "bCanView": true}
/*"12398712397129" is image file name*/
   "92347324667324": {"bCurrent": false, "bCanView": false}}
/*"98127397192319" is image file name*/
   "72832532467217": {"bCurrent": true, "bCanView": false}}
/*"72832532467217" is image file name*/
}


so if user want to get current image for record  201700000000002 we will be
having following view

function(doc) {
for(var prop in doc){
if(prop!="_id" && prop!="_rev"){
  if(doc[prop].bCurrent!==undefined && doc[prop].bCurrent){
     emit(doc._id, { RecordID: doc._id,ImageID: prop,bCurrent:
doc[prop].bCurrent, doc[prop].bCanView});
}
}
}
which will be called with key "201700000000002".

but as mentioned earlier Images will be added/updated/deleted/retrieve on
regular basis [10000/20000 per day] how this going to affect views
performance?

Regards,
Ajay.