You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Manjunath Somashekhar <ma...@yahoo.com> on 2008/12/30 07:07:11 UTC

Question on performance of getting document by id

hi All,

I have been evaluating couchDB for a project and was trying some performance tests, one of them is to test the performance of getting a document by id.

I have written up a small python script that loads about a million documents (very simple - {_id, value}), for the test i assigned the ids myself instead of the uuid assigned by couchDB. id starts from 0 and reach a million.

After the loading is done, I ran another python script that tries to get each of the million documents - it ran for a few hours and then i killed it.
Tried running the same python script simultaneously with different key ranges (4 to be precise) - it ran and completed in about 3 hrs on a mimimum - for multiple runs.

This means 1000000/(3*60*60) ~ 93 gets per second. Is this the current performance benchmark ? or is there some thing stupid that i am doing. BTW this is way too slow for the application i was exploring couchDB for.

Please let me know if there are any suggestions.

Environment:
python-couchDB lib - latest version 0.5.x
python - 2.5.3
ubuntu - 8.04
laptop has 4G or RAM, dual core and about 80G of HDD.
python-couchDB - bulk docs - insertion.
python-couchDB - get by id - multiple options tried - like db[doc_id], db.get(doc_id)
Tried creating a view on id (was just trying - AFAIU an index should already exist on id) - took hours and hours and i killed it.

Sample code:
### insertion ###
count = 0
id = 0
lineCount = 0
# input comes from STDIN (standard input)
batch = []
for line in sys.stdin:
    # remove leading and trailing whitespace
    lineCount += 1
    line = line.strip()
    values = []
    # parsing of the input line
    for size in range(sizesLength):
        value = line[absIndex[size]:absIndex[size + 1]].strip()
        values.append(value)
        if size == sizesLength - 2:
            break
    
    idS = '%s' % (id)

    batch.append({"partnerCode":values[1],
                       "_id":idS 
                      })
    count += 1
    id += 1
    
    if count % 10000 == 0:
       db.update(batch)
### insertion ###

### fetch ###
for line in range(1000000):   
    idS = '%s' % (line)
    tx = db[idS]
### fetch ###

Thanks
Manju

Re: Question on performance of getting document by id

Posted by Chris Anderson <jc...@gmail.com>.

On Mon, Dec 29, 2008 at 10:07 PM, Manjunath Somashekhar
<ma...@yahoo.com> wrote:
> This means 1000000/(3*60*60) ~ 93 gets per second. Is this the current performance benchmark ? or is there some thing stupid that i am doing. BTW this is way too slow for the application i was exploring couchDB for.

This seems slow. What is your Erlang version? It will say at the top
of the `erl` command line interface. You should be newere than 5.5.5
at least. I'm not sure what the exact cutoff for usable Erlang is, but
we're working on adding an error message at build time if your Erlang
is not up to date.

-- 
Chris Anderson
http://jchris.mfdz.com