You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Scott Zhang <ge...@gmail.com> on 2009/02/26 05:02:27 UTC

What's the speed(performance) of couchdb?

Hi. All.I am somewhat tired of SqlServer's performance of Insert. I have
lots of records want to insert into database, but as I insert them into
SqlServer, "insert into" sentence can get a hit of ~2000 record each time,
it is slow in my view.
As I try bcp, bcp can insert more than 10K records each time.

I heared couchdb for many times from erlang mailing list. Today I have my
test, it is a very simple program to insert records into couchDB.
Becaue one instance is slow, I running 10+ programs concurrent to insert
records into couchDB.

        static void Main(string[] args)
        {
          DB mDB = new DB();
          string sServer = "http://localhost:5984";
          string sDb = "test";
          string[] dbList = mDB.GetDatabases(sServer);
          foreach (string db in dbList)
          {
              Console.WriteLine(db);
          }
            for(int i=0;i<10000000;i++){
                Console.WriteLine("Processing " + i.ToString());
                 Person p = new Person();
                    p.name = "zhang" + i.ToString();
                    p.age = 12 + i;
                string json = JsonMapper.ToJson(p);
                mDB.CreateDocument(sServer, sDb, json);
            }
        }


But the performance is as bad as I can image, After several minutes run, I
only inserted into 120K records. I saw the speed is ~20 records each second.
When only 1 copy was running, seems higher, ~80. When I got 10 programs
running, the speed is very slow. After finally, couchDB crashed itself. :L.
Anyway, couchDB didn't impress me this time or impress me too much.

Is there any option?
I am using couchDB binary release on window2003.

Regards.
Scott Zhang

Re: What's the speed(performance) of couchdb?

Posted by kowsik <ko...@gmail.com>.
Nothing's going to be "free" automatically. I was playing with sqlite
in the last couple of days and by turning off journaling and
synchronizing to the file system (because I was doing one time writes
to build up indexes) I went from this thing running from over 5
minutes to 40 seconds.

I guess what I'm saying is performance and optimization depends on
what you are trying to do. I do think that bulk requests should give
you a big boost. I would have to imagine the size of the document
matters too in how many inserts/sec you get.

K.

On Wed, Feb 25, 2009 at 8:30 PM, Scott Zhang <ge...@gmail.com> wrote:
> Hi. Thanks for replying.
> But what a database is for if it is slow? Every database has the feature to
> make cluster to improve speed and capacity (Don't metion "access" things).
>
>
> I was expecting couchDB is as fast as SqlServer or mysql. At least I know,
> mnesia is much faster than SqlServer. But mnesia always throw harmless
> "overload" message.
>
> I will try bulk insert now. But be  fair, I was inserting  into sqlserver
> one insert one time.
>
> Regards.
>
>
>
>
> On Thu, Feb 26, 2009 at 12:18 PM, Jens Alfke <je...@mooseyard.com> wrote:
>
>>
>> On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:
>>
>>  But the performance is as bad as I can image, After several minutes run, I
>>> only inserted into 120K records. I saw the speed is ~20 records each
>>> second.
>>>
>>
>> Use the bulk-insert API to improve speed. The way you're doing it, every
>> record being added is a separate transaction, which requires a separate HTTP
>> request and flushing the file.
>>
>> (I'm a CouchDB newbie, but I don't think the point of CouchDB is speed.
>> What's exciting about it is the flexibility and the ability to build
>> distributed systems. If you're looking for a traditional database with
>> speed, have you tried MySQL?)
>>
>> —Jens
>

Re: What's the speed(performance) of couchdb?

Posted by lasizoillo <la...@gmail.com>.
2009/2/26 Scott Zhang <ge...@gmail.com>:
> Thanks. Jan.
> I am using 0.8.1 beta installer download from couchDB wiki.
>
> ---------------------------------------------------
> The question is what you need you system to look like eventually. If this is
> an initial data-import and after that you get mostly read requests, the
> longer
> insertion time will amortize over time.
> ---------------------------------------------------
> Yes. I am trying to transfer the keyword-index database from SqlServer to a
> database to banlance the pressure on SqlServer database.  The I will do
> search use the keyword-index database.  So the first import process is very
> important to me. After initial data-import process, I can slowly add new
> keywords in.
>
> My candidates are Mnesia(prefer at first time), couchDB, postgresql, mysql.
>

Maybe a Berkeley DB like database is a good option. Tokyo Tyrant
http://tokyocabinet.sourceforge.net/benchmark.pdf

> The mnesia's performance(insert) is good. But there is a weird issue about
> Http_client in erlang windows release. After reporting that bugs on erlang
> mailing list but get no reponse.  I finally have to give up mnesia.
>
> CouchDB is my second try, but the problem is as I showed in the mail.
>
> Now I am working with postgresql. At least I see it works as good as
> SqlServer.
>
>
> I will check couchDB back soon later after your 1.0 release. But as I see,
> for user play with huge records, I can see when I saw couchDB 1.0. I will be
> playing with 1 billion records. So speed is the most important thing I care.
>

If you want do a very heavy migration is better develop a script that
generate de db file directly. The net layers consume a lot of time.
Develop an aplication that write bytes directly to disk can save you a
lot of time.

I don't know how to do profiling in erlang to do a estimation of the
time saving.


Regards,

Javi

PD: Excuse me for my bad english

> Cheers.
> Thanks for your hard working.
>
>
> Regards.
> Scott
>
>
>
>
>
>
>
> On Thu, Feb 26, 2009 at 6:04 PM, Jan Lehnardt <ja...@apache.org> wrote:
>
>> Hi Scott,
>>
>> thanks for your feedback. As a general note, you can't expect any magic
>> from CouchDB. It is bound by the same constraint all other programmes
>> are. To get the most out of CouchDB or SqlServer or MySQL, you need
>> to understand how it works.
>>
>>
>> On 26 Feb 2009, at 05:30, Scott Zhang wrote:
>>
>>  Hi. Thanks for replying.
>>> But what a database is for if it is slow? Every database has the feature
>>> to
>>> make cluster to improve speed and capacity (Don't metion "access" things).
>>>
>>
>> The point of CouchDB is allowing high numbers of concurrent requests. This
>> gives you more throughput for a single machine but not necessarily faster
>> single query execution speed.
>>
>>
>>  I was expecting couchDB is as fast as SqlServer or mysql. At least I know,
>>> mnesia is much faster than SqlServer. But mnesia always throw harmless
>>> "overload" message.
>>>
>>
>> CouchDB is not nearly as old as either of them. Did you really expect a
>> software in alpha stages to be faster than fine-tuned systems that have
>> been used in production for a decade or longer?
>>
>>
>>  I will try bulk insert now. But be  fair, I was inserting  into sqlserver
>>> one insert one time.
>>>
>>
>> Insert speed can be speed up in numerous ways:
>>
>>  - Use sequential descending document ids on insert.
>>  - Use bulk insert.
>>  - Bypass the HTTP API and insert native Erlang terms and skip JSON
>> conversion.
>>
>> The question is what you need you system to look like eventually. If this
>> is
>> an initial data-import and after that you get mostly read requests, the
>> longer
>> insertion time will amortize over time.
>>
>> What version is the Windows binary you are using? If it is still 0.8, you
>> should
>> try trunk (which most likely means switching to some UNIXy system).
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>
>>
>>
>>
>>> Regards.
>>>
>>>
>>>
>>>
>>> On Thu, Feb 26, 2009 at 12:18 PM, Jens Alfke <je...@mooseyard.com> wrote:
>>>
>>>
>>>> On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:
>>>>
>>>> But the performance is as bad as I can image, After several minutes run,
>>>> I
>>>>
>>>>> only inserted into 120K records. I saw the speed is ~20 records each
>>>>> second.
>>>>>
>>>>>
>>>> Use the bulk-insert API to improve speed. The way you're doing it, every
>>>> record being added is a separate transaction, which requires a separate
>>>> HTTP
>>>> request and flushing the file.
>>>>
>>>> (I'm a CouchDB newbie, but I don't think the point of CouchDB is speed.
>>>> What's exciting about it is the flexibility and the ability to build
>>>> distributed systems. If you're looking for a traditional database with
>>>> speed, have you tried MySQL?)
>>>>
>>>> —Jens
>>>>
>>>
>>
>

Re: [user] Re: What's the speed(performance) of couchdb?

Posted by Scott Zhang <ge...@gmail.com>.
Hi. Wout.
Thanks for asking.

--I think what we're asking is, is it ok for your initial import to take a
longer time?

If the longer time is caused by technical restriction, it is acceptable. But
if SqlServer need 5 days, couchDB need 8 days, mnesia need 3 days, and
postgresql need 4 days. Due to my technical limitation. I will choose
postgresql.

If I have no choice, initial import to take a longer time is ok.



On Thu, Feb 26, 2009 at 9:55 PM, Wout Mertens <wm...@cisco.com> wrote:

> On Feb 26, 2009, at 2:34 PM, Scott Zhang wrote:
>
>  I will check couchDB back soon later after your 1.0 release. But as I see,
>> for user play with huge records, I can see when I saw couchDB 1.0. I will
>> be
>> playing with 1 billion records. So speed is the most important thing I
>> care.
>>
>
> I think what we're asking is, is it ok for your initial import to take a
> longer time?
>
> Due to the way views are stored, a database with many queries but little
> updates after initial import will be extremely fast.
>
> Wout.
>

Re: [user] Re: What's the speed(performance) of couchdb?

Posted by Wout Mertens <wm...@cisco.com>.
On Feb 26, 2009, at 2:34 PM, Scott Zhang wrote:

> I will check couchDB back soon later after your 1.0 release. But as  
> I see,
> for user play with huge records, I can see when I saw couchDB 1.0. I  
> will be
> playing with 1 billion records. So speed is the most important thing  
> I care.

I think what we're asking is, is it ok for your initial import to take  
a longer time?

Due to the way views are stored, a database with many queries but  
little updates after initial import will be extremely fast.

Wout.

Re: What's the speed(performance) of couchdb?

Posted by Scott Zhang <ge...@gmail.com>.
Thanks. Jan.
I am using 0.8.1 beta installer download from couchDB wiki.

---------------------------------------------------
The question is what you need you system to look like eventually. If this is
an initial data-import and after that you get mostly read requests, the
longer
insertion time will amortize over time.
---------------------------------------------------
Yes. I am trying to transfer the keyword-index database from SqlServer to a
database to banlance the pressure on SqlServer database.  The I will do
search use the keyword-index database.  So the first import process is very
important to me. After initial data-import process, I can slowly add new
keywords in.

My candidates are Mnesia(prefer at first time), couchDB, postgresql, mysql.

The mnesia's performance(insert) is good. But there is a weird issue about
Http_client in erlang windows release. After reporting that bugs on erlang
mailing list but get no reponse.  I finally have to give up mnesia.

CouchDB is my second try, but the problem is as I showed in the mail.

Now I am working with postgresql. At least I see it works as good as
SqlServer.


I will check couchDB back soon later after your 1.0 release. But as I see,
for user play with huge records, I can see when I saw couchDB 1.0. I will be
playing with 1 billion records. So speed is the most important thing I care.

Cheers.
Thanks for your hard working.


Regards.
Scott







On Thu, Feb 26, 2009 at 6:04 PM, Jan Lehnardt <ja...@apache.org> wrote:

> Hi Scott,
>
> thanks for your feedback. As a general note, you can't expect any magic
> from CouchDB. It is bound by the same constraint all other programmes
> are. To get the most out of CouchDB or SqlServer or MySQL, you need
> to understand how it works.
>
>
> On 26 Feb 2009, at 05:30, Scott Zhang wrote:
>
>  Hi. Thanks for replying.
>> But what a database is for if it is slow? Every database has the feature
>> to
>> make cluster to improve speed and capacity (Don't metion "access" things).
>>
>
> The point of CouchDB is allowing high numbers of concurrent requests. This
> gives you more throughput for a single machine but not necessarily faster
> single query execution speed.
>
>
>  I was expecting couchDB is as fast as SqlServer or mysql. At least I know,
>> mnesia is much faster than SqlServer. But mnesia always throw harmless
>> "overload" message.
>>
>
> CouchDB is not nearly as old as either of them. Did you really expect a
> software in alpha stages to be faster than fine-tuned systems that have
> been used in production for a decade or longer?
>
>
>  I will try bulk insert now. But be  fair, I was inserting  into sqlserver
>> one insert one time.
>>
>
> Insert speed can be speed up in numerous ways:
>
>  - Use sequential descending document ids on insert.
>  - Use bulk insert.
>  - Bypass the HTTP API and insert native Erlang terms and skip JSON
> conversion.
>
> The question is what you need you system to look like eventually. If this
> is
> an initial data-import and after that you get mostly read requests, the
> longer
> insertion time will amortize over time.
>
> What version is the Windows binary you are using? If it is still 0.8, you
> should
> try trunk (which most likely means switching to some UNIXy system).
>
> Cheers
> Jan
> --
>
>
>
>
>
>
>
>> Regards.
>>
>>
>>
>>
>> On Thu, Feb 26, 2009 at 12:18 PM, Jens Alfke <je...@mooseyard.com> wrote:
>>
>>
>>> On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:
>>>
>>> But the performance is as bad as I can image, After several minutes run,
>>> I
>>>
>>>> only inserted into 120K records. I saw the speed is ~20 records each
>>>> second.
>>>>
>>>>
>>> Use the bulk-insert API to improve speed. The way you're doing it, every
>>> record being added is a separate transaction, which requires a separate
>>> HTTP
>>> request and flushing the file.
>>>
>>> (I'm a CouchDB newbie, but I don't think the point of CouchDB is speed.
>>> What's exciting about it is the flexibility and the ability to build
>>> distributed systems. If you're looking for a traditional database with
>>> speed, have you tried MySQL?)
>>>
>>> —Jens
>>>
>>
>

Re: What's the speed(performance) of couchdb?

Posted by Chris Anderson <jc...@apache.org>.
On Thu, Feb 26, 2009 at 2:04 AM, Jan Lehnardt <ja...@apache.org> wrote:
> Hi Scott,
>
> thanks for your feedback. As a general note, you can't expect any magic
> from CouchDB. It is bound by the same constraint all other programmes
> are. To get the most out of CouchDB or SqlServer or MySQL, you need
> to understand how it works.
>
>
> On 26 Feb 2009, at 05:30, Scott Zhang wrote:
>
>> Hi. Thanks for replying.
>> But what a database is for if it is slow? Every database has the feature
>> to
>> make cluster to improve speed and capacity (Don't metion "access" things).
>
> The point of CouchDB is allowing high numbers of concurrent requests. This
> gives you more throughput for a single machine but not necessarily faster
> single query execution speed.
>
>
>> I was expecting couchDB is as fast as SqlServer or mysql. At least I know,
>> mnesia is much faster than SqlServer. But mnesia always throw harmless
>> "overload" message.
>
> CouchDB is not nearly as old as either of them. Did you really expect a
> software in alpha stages to be faster than fine-tuned systems that have
> been used in production for a decade or longer?
>
>
>> I will try bulk insert now. But be  fair, I was inserting  into sqlserver
>> one insert one time.
>
> Insert speed can be speed up in numerous ways:
>
>  - Use sequential descending document ids on insert.

or ascending...

>  - Use bulk insert.

with ascending keys and bulk insert of 1000 docs at a time I was able
to write 3k docs per second. here is the benchmark script:
http://friendpaste.com/5g0kOEPonxdXMKibNRzetJ


>  - Bypass the HTTP API and insert native Erlang terms and skip JSON
> conversion.

doing this I was able to get 6k docs / sec

In a separate test using attachments of 250k and an Erlang API (no
HTTP) I was able to write to my disk at 80% of the speed it can accept
when streaming raw bytes to disk. (Rougly 20 MB/sec)

>
> The question is what you need you system to look like eventually. If this is
> an initial data-import and after that you get mostly read requests, the
> longer
> insertion time will amortize over time.
>
> What version is the Windows binary you are using? If it is still 0.8, you
> should
> try trunk (which most likely means switching to some UNIXy system).
>
> Cheers
> Jan
> --
>
>
>
>
>
>>
>> Regards.
>>
>>
>>
>>
>> On Thu, Feb 26, 2009 at 12:18 PM, Jens Alfke <je...@mooseyard.com> wrote:
>>
>>>
>>> On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:
>>>
>>> But the performance is as bad as I can image, After several minutes run,
>>> I
>>>>
>>>> only inserted into 120K records. I saw the speed is ~20 records each
>>>> second.
>>>>
>>>
>>> Use the bulk-insert API to improve speed. The way you're doing it, every
>>> record being added is a separate transaction, which requires a separate
>>> HTTP
>>> request and flushing the file.
>>>
>>> (I'm a CouchDB newbie, but I don't think the point of CouchDB is speed.
>>> What's exciting about it is the flexibility and the ability to build
>>> distributed systems. If you're looking for a traditional database with
>>> speed, have you tried MySQL?)
>>>
>>> —Jens
>
>



-- 
Chris Anderson
http://jchris.mfdz.com

Re: What's the speed(performance) of couchdb?

Posted by Jan Lehnardt <ja...@apache.org>.
Hi Scott,

thanks for your feedback. As a general note, you can't expect any magic
from CouchDB. It is bound by the same constraint all other programmes
are. To get the most out of CouchDB or SqlServer or MySQL, you need
to understand how it works.


On 26 Feb 2009, at 05:30, Scott Zhang wrote:

> Hi. Thanks for replying.
> But what a database is for if it is slow? Every database has the  
> feature to
> make cluster to improve speed and capacity (Don't metion "access"  
> things).

The point of CouchDB is allowing high numbers of concurrent requests.  
This
gives you more throughput for a single machine but not necessarily  
faster
single query execution speed.


> I was expecting couchDB is as fast as SqlServer or mysql. At least I  
> know,
> mnesia is much faster than SqlServer. But mnesia always throw harmless
> "overload" message.

CouchDB is not nearly as old as either of them. Did you really expect a
software in alpha stages to be faster than fine-tuned systems that have
been used in production for a decade or longer?


> I will try bulk insert now. But be  fair, I was inserting  into  
> sqlserver
> one insert one time.

Insert speed can be speed up in numerous ways:

  - Use sequential descending document ids on insert.
  - Use bulk insert.
  - Bypass the HTTP API and insert native Erlang terms and skip JSON  
conversion.

The question is what you need you system to look like eventually. If  
this is
an initial data-import and after that you get mostly read requests,  
the longer
insertion time will amortize over time.

What version is the Windows binary you are using? If it is still 0.8,  
you should
try trunk (which most likely means switching to some UNIXy system).

Cheers
Jan
--





>
> Regards.
>
>
>
>
> On Thu, Feb 26, 2009 at 12:18 PM, Jens Alfke <je...@mooseyard.com>  
> wrote:
>
>>
>> On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:
>>
>> But the performance is as bad as I can image, After several minutes  
>> run, I
>>> only inserted into 120K records. I saw the speed is ~20 records each
>>> second.
>>>
>>
>> Use the bulk-insert API to improve speed. The way you're doing it,  
>> every
>> record being added is a separate transaction, which requires a  
>> separate HTTP
>> request and flushing the file.
>>
>> (I'm a CouchDB newbie, but I don't think the point of CouchDB is  
>> speed.
>> What's exciting about it is the flexibility and the ability to build
>> distributed systems. If you're looking for a traditional database  
>> with
>> speed, have you tried MySQL?)
>>
>> —Jens


Re: What's the speed(performance) of couchdb?

Posted by Jens Alfke <je...@mooseyard.com>.
On Feb 25, 2009, at 8:30 PM, Scott Zhang wrote:

> But what a database is for if it is slow?

To store data, of course. And storing it quickly is just one feature.  
As an example, a tape backup system has terrible access time, but is  
still valuable for its vast capacity.

CouchDB has other advantages, like flexibility of storage (no schema  
needed) and its distributed nature. Sure, most DB servers support  
replication, but that's not the same thing. As an example (of the kind  
of thing that interests me) CouchDB could be used to implement a  
database with millions of server nodes all around the world. The nodes  
don't have to have any central organization, or necessarily even trust  
each other. (This is somewhat like the organization of P2P distributed  
hash table systems like Chord, Pastry, etc.) In some ways performance  
would be terrible — data that isn't cached at a local node might take  
seconds or minutes to be fetched from another, and it might take  
minutes or hours for changes made at one node to propagate to all the  
others — but other benefits would compensate.

—Jens

Re: What's the speed(performance) of couchdb?

Posted by Scott Zhang <ge...@gmail.com>.
Hi. Thanks for replying.
But what a database is for if it is slow? Every database has the feature to
make cluster to improve speed and capacity (Don't metion "access" things).


I was expecting couchDB is as fast as SqlServer or mysql. At least I know,
mnesia is much faster than SqlServer. But mnesia always throw harmless
"overload" message.

I will try bulk insert now. But be  fair, I was inserting  into sqlserver
one insert one time.

Regards.




On Thu, Feb 26, 2009 at 12:18 PM, Jens Alfke <je...@mooseyard.com> wrote:

>
> On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:
>
>  But the performance is as bad as I can image, After several minutes run, I
>> only inserted into 120K records. I saw the speed is ~20 records each
>> second.
>>
>
> Use the bulk-insert API to improve speed. The way you're doing it, every
> record being added is a separate transaction, which requires a separate HTTP
> request and flushing the file.
>
> (I'm a CouchDB newbie, but I don't think the point of CouchDB is speed.
> What's exciting about it is the flexibility and the ability to build
> distributed systems. If you're looking for a traditional database with
> speed, have you tried MySQL?)
>
> —Jens

Re: What's the speed(performance) of couchdb?

Posted by Jens Alfke <je...@mooseyard.com>.
On Feb 25, 2009, at 8:02 PM, Scott Zhang wrote:

> But the performance is as bad as I can image, After several minutes  
> run, I
> only inserted into 120K records. I saw the speed is ~20 records each  
> second.

Use the bulk-insert API to improve speed. The way you're doing it,  
every record being added is a separate transaction, which requires a  
separate HTTP request and flushing the file.

(I'm a CouchDB newbie, but I don't think the point of CouchDB is  
speed. What's exciting about it is the flexibility and the ability to  
build distributed systems. If you're looking for a traditional  
database with speed, have you tried MySQL?)

—Jens