You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Talib Sharif <ts...@mymedify.com> on 2010/08/06 22:38:13 UTC

Scalability of _changes api?

Hey All,

Do people have experience with the scalability and performance of the  
_changes api in general, and especially when using with filters?

How many connections can be kept open?

And is the changes api function of size/updates/total_no_documents?

Thanks,
Talib

Re: Scalability of _changes api?

Posted by Mikeal Rogers <mi...@gmail.com>.
Actually, it's probably better to use no server side filter and just use a
filter on the client so long as your update frequency isn't so enormous that
it would overload a single client.

-Mikeal

On Sat, Aug 7, 2010 at 4:37 AM, Sivan Greenberg <si...@omniqueue.com> wrote:

> As I am also using a JS filter and fear the performance and load
> consequences, how does one go about writing an erlang filter?
>
> -SIvan
>
> On Fri, Aug 6, 2010 at 11:55 PM, J Chris Anderson <jc...@apache.org>
> wrote:
> >
> > On Aug 6, 2010, at 1:38 PM, Talib Sharif wrote:
> >
> >> Hey All,
> >>
> >> Do people have experience with the scalability and performance of the
> _changes api in general, and especially when using with filters?
> >>
> >> How many connections can be kept open?
> >>
> >
> > If you use a JavaScript filter, you will have more limited concurrency
> that with an Erlang filter, as the JS filters run in their own OS process.
> CouchDB tries to be reasonably efficient with these, but they are still much
> more heavyweight than the Erlang ones.
> >
> >> And is the changes api function of size/updates/total_no_documents?
> >>
> >
> > I think the _changes API should have no scalability issues, as the wall
> you will hit long before running an (Erlang) changes filter will be the
> insert / update rate of the database itself.
> >
> > If you were to say, make thousands of concurrent changes requests, with
> varying since=Seq params, that would be the worst case, so you can test that
> work load if you want to find boundaries conditions. (Bottleneck here would
> be for disk IO reads I think).
> >
> > Chris
> >
> >> Thanks,
> >> Talib
> >
> >
>

Re: Scalability of _changes api?

Posted by J Chris Anderson <jc...@apache.org>.
On Aug 7, 2010, at 10:39 AM, J Chris Anderson wrote:

> 
> On Aug 7, 2010, at 4:37 AM, Sivan Greenberg wrote:
> 
>> As I am also using a JS filter and fear the performance and load
>> consequences, how does one go about writing an erlang filter?
>> 
> 
> The Erlang filter is gonna be the most efficient option. First create a design doc with language == "erlang"
> 
> then write your filter like the "show" here, but it returns true or false.
> 
> http://github.com/apache/couchdb/blob/trunk/share/www/script/test/erlang_views.js#L57
> 
> another example from this (fixed) bug report.
> 
> https://issues.apache.org/jira/browse/COUCHDB-740
> 

Found a better example of an Erlang changes filter here:

http://github.com/apache/couchdb/blob/trunk/share/www/script/test/changes.js#L374

> Chris
> 
> 
>> -SIvan
>> 
>> On Fri, Aug 6, 2010 at 11:55 PM, J Chris Anderson <jc...@apache.org> wrote:
>>> 
>>> On Aug 6, 2010, at 1:38 PM, Talib Sharif wrote:
>>> 
>>>> Hey All,
>>>> 
>>>> Do people have experience with the scalability and performance of the _changes api in general, and especially when using with filters?
>>>> 
>>>> How many connections can be kept open?
>>>> 
>>> 
>>> If you use a JavaScript filter, you will have more limited concurrency that with an Erlang filter, as the JS filters run in their own OS process. CouchDB tries to be reasonably efficient with these, but they are still much more heavyweight than the Erlang ones.
>>> 
>>>> And is the changes api function of size/updates/total_no_documents?
>>>> 
>>> 
>>> I think the _changes API should have no scalability issues, as the wall you will hit long before running an (Erlang) changes filter will be the insert / update rate of the database itself.
>>> 
>>> If you were to say, make thousands of concurrent changes requests, with varying since=Seq params, that would be the worst case, so you can test that work load if you want to find boundaries conditions. (Bottleneck here would be for disk IO reads I think).
>>> 
>>> Chris
>>> 
>>>> Thanks,
>>>> Talib
>>> 
>>> 
> 


Re: Scalability of _changes api?

Posted by J Chris Anderson <jc...@apache.org>.
On Aug 7, 2010, at 4:37 AM, Sivan Greenberg wrote:

> As I am also using a JS filter and fear the performance and load
> consequences, how does one go about writing an erlang filter?
> 

The Erlang filter is gonna be the most efficient option. First create a design doc with language == "erlang"

then write your filter like the "show" here, but it returns true or false.

http://github.com/apache/couchdb/blob/trunk/share/www/script/test/erlang_views.js#L57

another example from this (fixed) bug report.

https://issues.apache.org/jira/browse/COUCHDB-740

Chris


> -SIvan
> 
> On Fri, Aug 6, 2010 at 11:55 PM, J Chris Anderson <jc...@apache.org> wrote:
>> 
>> On Aug 6, 2010, at 1:38 PM, Talib Sharif wrote:
>> 
>>> Hey All,
>>> 
>>> Do people have experience with the scalability and performance of the _changes api in general, and especially when using with filters?
>>> 
>>> How many connections can be kept open?
>>> 
>> 
>> If you use a JavaScript filter, you will have more limited concurrency that with an Erlang filter, as the JS filters run in their own OS process. CouchDB tries to be reasonably efficient with these, but they are still much more heavyweight than the Erlang ones.
>> 
>>> And is the changes api function of size/updates/total_no_documents?
>>> 
>> 
>> I think the _changes API should have no scalability issues, as the wall you will hit long before running an (Erlang) changes filter will be the insert / update rate of the database itself.
>> 
>> If you were to say, make thousands of concurrent changes requests, with varying since=Seq params, that would be the worst case, so you can test that work load if you want to find boundaries conditions. (Bottleneck here would be for disk IO reads I think).
>> 
>> Chris
>> 
>>> Thanks,
>>> Talib
>> 
>> 


Re: Scalability of _changes api?

Posted by Matthew Sinclair-Day <ms...@gmail.com>.
On 8/7/10 at 7:37 AM, sivan@omniqueue.com (Sivan Greenberg) wrote:

>As I am also using a JS filter and fear the performance and load
>consequences, how does one go about writing an erlang filter?
>
>-SIvan

I can't claim it to be idiomatic Erlang, but this is the one I 
am working on that accepts query argument from the client 
opening the change feed.

fun({Doc}, {Req}) ->
     {Query} = couch_util:get_value(<<"query">>, Req),
     ExcludedServer = couch_util:get_value(<<"excludeServer">>, Query),
     case {couch_util:get_value(<<"docType">>, Doc), 
couch_util:get_value(<<"updatedOnServer">>, Doc)} of
         {<<"CS_MANIFEST">>, null} ->
             CreatedOnServer = 
couch_util:get_value(<<"createdOnServer">>, Doc),
             CreatedOnServer =/= ExcludedServer;
         {<<"CS_MANIFEST">>, undefined} ->
             false;
         {<<CS_MANIFEST">>, UpdatedOnServer} ->
             UpdatedOnServer =/= ExcludedServer;
         _ ->
             false
     end
end.


BTW, since strings are represented as Binaries (<<>> syntax), 
string manipulation in Erlang map-reduce and filter functions is 
easy with Erlang's BIFs that manipulate binaries, like split_binary().

Matt


Re: Scalability of _changes api?

Posted by Sivan Greenberg <si...@omniqueue.com>.
As I am also using a JS filter and fear the performance and load
consequences, how does one go about writing an erlang filter?

-SIvan

On Fri, Aug 6, 2010 at 11:55 PM, J Chris Anderson <jc...@apache.org> wrote:
>
> On Aug 6, 2010, at 1:38 PM, Talib Sharif wrote:
>
>> Hey All,
>>
>> Do people have experience with the scalability and performance of the _changes api in general, and especially when using with filters?
>>
>> How many connections can be kept open?
>>
>
> If you use a JavaScript filter, you will have more limited concurrency that with an Erlang filter, as the JS filters run in their own OS process. CouchDB tries to be reasonably efficient with these, but they are still much more heavyweight than the Erlang ones.
>
>> And is the changes api function of size/updates/total_no_documents?
>>
>
> I think the _changes API should have no scalability issues, as the wall you will hit long before running an (Erlang) changes filter will be the insert / update rate of the database itself.
>
> If you were to say, make thousands of concurrent changes requests, with varying since=Seq params, that would be the worst case, so you can test that work load if you want to find boundaries conditions. (Bottleneck here would be for disk IO reads I think).
>
> Chris
>
>> Thanks,
>> Talib
>
>

Re: Scalability of _changes api?

Posted by J Chris Anderson <jc...@apache.org>.
On Aug 6, 2010, at 1:38 PM, Talib Sharif wrote:

> Hey All,
> 
> Do people have experience with the scalability and performance of the _changes api in general, and especially when using with filters?
> 
> How many connections can be kept open?
> 

If you use a JavaScript filter, you will have more limited concurrency that with an Erlang filter, as the JS filters run in their own OS process. CouchDB tries to be reasonably efficient with these, but they are still much more heavyweight than the Erlang ones.

> And is the changes api function of size/updates/total_no_documents?
> 

I think the _changes API should have no scalability issues, as the wall you will hit long before running an (Erlang) changes filter will be the insert / update rate of the database itself.

If you were to say, make thousands of concurrent changes requests, with varying since=Seq params, that would be the worst case, so you can test that work load if you want to find boundaries conditions. (Bottleneck here would be for disk IO reads I think).

Chris

> Thanks,
> Talib


Re: Scalability of _changes api?

Posted by Matthew Sinclair-Day <ms...@gmail.com>.
On 8/6/10 at 4:38 PM, tsharif@mymedify.com (Talib Sharif) wrote:

>Hey All,
>
>Do people have experience with the scalability and performance 
>of the _changes api in general, and especially when using with filters?
>
>How many connections can be kept open?
>
>And is the changes api function of size/updates/total_no_documents?
>
>Thanks,
>Talib

Talib,

I'm in the middle of characterizing a scaling problem with 
_changes and a JS filter.  Basically, under steady load of 
approximately 50 new docs per second, the number of couchjs 
processes increases until it tops out around 100 and the 
document insert rate slows considerably.  beam CPU% increases to 
around 60%.  After 24 hours, with load turned off, the system 
does not recover.  This is a Solaris 10/intel system.

There are five databases, and a single change listener per 
database is opened, though the load is being driven only into 
one database.

I've rewritten the filter into Erlang, but owing to a bug in 
0.11, the test will have to wait until Couch is upgraded to 
0.11.1 or higher.

Matt