You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Filipe Manana (Created) (JIRA)" <ji...@apache.org> on 2011/11/06 15:52:51 UTC

[jira] [Created] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Indexer speedup (for non-native view servers)
---------------------------------------------

                 Key: COUCHDB-1334
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
             Project: CouchDB
          Issue Type: Improvement
          Components: Database Core, JavaScript View Server, View Server Support
            Reporter: Filipe Manana
            Assignee: Filipe Manana
             Fix For: 1.2
         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch

The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.

The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).

The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.

Here follow some benchmarks.


test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)


branch 1.2.x

$ echo 3 > /proc/sys/vm/drop_caches
$ time curl http://localhost:5984/test_db/_design/test/_view/test1
{"rows":[
{"key":null,"value":1000000}
]}

real	2m45.097s
user	0m0.006s
sys	0m0.007s

view file size: 333Mb

CPU usage:

$ sar 1 60
22:27:20  %usr  %nice   %sys   %idle
22:27:21   38      0     12     50
(....)
22:28:21   39      0     13     49
Average:     39      0     13     47   


branch 1.2.x + batch patch (first patch)

$ echo 3 > /proc/sys/vm/drop_caches
$ time curl http://localhost:5984/test_db/_design/test/_view/test1
{"rows":[
{"key":null,"value":1000000}
]}

real	2m12.736s
user	0m0.006s
sys	0m0.005s

view file size 72Mb


branch 1.2.x + batch patch + os_process patch

$ echo 3 > /proc/sys/vm/drop_caches
$ time curl http://localhost:5984/test_db/_design/test/_view/test1
{"rows":[
{"key":null,"value":1000000}
]}

real	1m9.330s
user	0m0.006s
sys	0m0.004s

view file size:  72Mb

CPU usage:

$ sar 1 60
22:22:55  %usr  %nice   %sys   %idle
22:23:53   22      0      6     72
(....)
22:23:55   22      0      6     72
Average:     22      0      7     70   



master/trunk

$ echo 3 > /proc/sys/vm/drop_caches
$ time curl http://localhost:5984/test_db/_design/test/_view/test1
{"rows":[
{"key":null,"value":1000000}
]}

real	1m57.296s
user	0m0.006s
sys	0m0.005s


master/trunk + os_process patch

$ echo 3 > /proc/sys/vm/drop_caches
$ time curl http://localhost:5984/test_db/_design/test/_view/test1
{"rows":[
{"key":null,"value":1000000}
]}

real	0m53.768s
user	0m0.006s
sys	0m0.006s




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Resolved] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by Dave Cottlehuber <da...@muse.net.nz>.
On 16 November 2011 13:02, Filipe Manana (Resolved) (JIRA)
<ji...@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Filipe Manana resolved COUCHDB-1334.
> ------------------------------------
>
>       Resolution: Fixed
>    Fix Version/s:     (was: 1.2)
>                   1.3
>
> Latest patch applied against master.
>
>> Indexer speedup (for non-native view servers)
>> ---------------------------------------------
>>
>>                 Key: COUCHDB-1334
>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
[snip]
>> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
>> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
>> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.

Hi Filipe,

Just a heads up, but I am consistently having the Erlang VM hang when
doing a master build today. Clearly, my issue may not be related to
this patch but I can't see anything else after a quick look in git
history that stands out.

As this uses the same Spidermonkey, Erlang R14B03 + patches etc as for
all my other builds, I am assuming this is a bug triggered by CouchDB
in the VM somewhere.

Hopefully tomorrow I'll both isolate the commit where things go awry,
and also find a way of getting some logging or a test case -- it would
be great to have a fix for this go into OTP/R15.

If you have time, or some advice to share, the binary build is at
https://www.dropbox.com/s/jeifcxpbtpo78ak/Snapshots/20111122

Thanks
Dave

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Filipe Manana (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147291#comment-13147291 ] 

Filipe Manana commented on COUCHDB-1334:
----------------------------------------

@Adam I haven't heard yet about plans to release 1.2.0 soon (where soon can mean in a few months). I think this gives a very good benefit that many users would be happy for. But I understand your point, it's not wrong.

@Paul, yes makes sense. This was more of an experiment. I looked into port_connect/2 before and was getting exit_status errors before receiving any responses back, so I did something wrong before. I just updated the patch, against master, so that all this logic is into couch_os_process, doesn't spawn an helper process and uses port_connect. Let me know what you think. Thanks.
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Paul Joseph Davis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147375#comment-13147375 ] 

Paul Joseph Davis commented on COUCHDB-1334:
--------------------------------------------

@Filipe, Awesome, this is considerably better than the old version.

As to this part:

+        % Can throw badarg error, when OsProc Pid is dead.
+        (catch port_connect(OsProc#os_proc.port, Pid))

That looks like the key to what I hadn't managed to track down when I tried something similar. I'm pretty sure we should be fine with couchspawnkillable, but do we need to close the port here and/or ignore some port exit status messages in this process? The other thing that is a bit confusing is why OsProc's Pid would be dead while its sitting idle. I haven't thought through all the implications here I guess.

Does this version maintain the same speedups as before? When I tried this approach I was actually doing it slightly differently by having the doc reader process send docs to the port which would then forward them directly to the writer process. There was some stuff that got a bit funky when I tried this though. IIRC it was something like, I had to pass deleted doc update_seq's directly to the writer process or it'd break  if the last update was a deletion (cause the writer would never see the update seq). Anyway, just a thought.

Good work on this.
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Filipe Manana (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-1334:
-----------------------------------

    Attachment: master-0002-More-efficient-communication-with-the-view-server.patch
                0002-More-efficient-communication-with-the-view-server.patch
                0001-More-efficient-view-updater-writes.patch
    
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Filipe Manana (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-1334:
-----------------------------------

    Attachment: master-2-0002-More-efficient-communication-with-the-view-server.patch

Second pipeline patch against master
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Adam Kocoloski (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146357#comment-13146357 ] 

Adam Kocoloski commented on COUCHDB-1334:
-----------------------------------------

Hi Filipe, good to see that pipelining makes such a difference, but I'm not sure introducing these big changes on a release branch is a good idea.  I thought the motivation for branching 1.2.x was to give it some time to stabilize and fix any bugs that we might find.
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Randall Leeds (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146413#comment-13146413 ] 

Randall Leeds commented on COUCHDB-1334:
----------------------------------------

Then again, Adam, we've got the SpiderMonkey issues to take care of and we still need to figure out whether or not that changes 1.2 into 2.0 (unless maybe we have enough new stuff for 1.2 that we would just release a not-forward-compatible-with-SM-trunk major version anyway). We should collectively make a decision about what to do with these branches soon.
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Filipe Manana (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-1334:
-----------------------------------

    Attachment: master-4-0002-More-efficient-communication-with-the-view-server.patch

On no objection, I'll push master-4-0002-More-efficient-communication-with-the-view-server.patch to master.
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch, master-3-0002-More-efficient-communication-with-the-view-server.patch, master-4-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Filipe Manana (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana resolved COUCHDB-1334.
------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 1.2)
                   1.3

Latest patch applied against master.
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.3
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch, master-3-0002-More-efficient-communication-with-the-view-server.patch, master-4-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Filipe Manana (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147393#comment-13147393 ] 

Filipe Manana commented on COUCHDB-1334:
----------------------------------------

Paul, yep performance was the same as before, forgot to mention that.

There's a thing, the port_connect in after might fail because of 2 things:
1) os process died (it was linked to the port)
2) readline got an error or timeout, closed the port and threw an exception, so the port_connect in after will fail with badarg because the port was closed

This diff over the previous patch makes it more clear, besides adding a necessary unlink:

-        % Can throw badarg error, when OsProc Pid is dead.
-        (catch port_connect(OsProc#os_proc.port, Pid))
+        % Can throw badarg error, when OsProc Pid is dead or port was closed
+        % by the readline function on error/timeout.
+        (catch port_connect(OsProc#os_proc.port, Pid)),
+        unlink(OsProc#os_proc.port)

(uploading new patch)
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch, master-3-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Filipe Manana (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-1334:
-----------------------------------

    Attachment: master-3-0002-More-efficient-communication-with-the-view-server.patch
    
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch, master-3-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Paul Joseph Davis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146441#comment-13146441 ] 

Paul Joseph Davis commented on COUCHDB-1334:
--------------------------------------------

@Randall, now that 1.8.5 is out, we should be able to lean on that for quite a long time. If people want to package versions of SpiderMonkey trunk in their distro, I'm disinclined to put too much immediate effort into supporting those random versions.

@Filipe, reading the patch I think the idea is pretty good in general but I'd implement it a bit differently. Firstly, the logic for whether or not pipelining is used shouldn't be exposed to the client. That's just going to entangle a whole bunch of API knowledge in the wrong place. I've been meaning to go back and finish the refactoring of couch_(os|native)_process and couch_query_servers which would make this behavior possible.

The other part of this that might be interesting is the erlang:port_connect/2 call that can set the destination Pid for that port. I played with it a bit during my refactoring work but couldn't get it to work quite right. I didn't spend too much time figuring it out, but it might be a way to skip the intermediary process and extra message passing.

http://erlang.org/doc/man/erlang.html#port_connect-2
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Paul Joseph Davis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149699#comment-13149699 ] 

Paul Joseph Davis commented on COUCHDB-1334:
--------------------------------------------

+1
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch, master-3-0002-More-efficient-communication-with-the-view-server.patch, master-4-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)

Posted by "Adam Kocoloski (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146417#comment-13146417 ] 

Adam Kocoloski commented on COUCHDB-1334:
-----------------------------------------

I vote we release 1.2 without tackling that issue.  1.2 has the new replicator, JSON in C, snappy compression ... plenty of great stuff that we want to get into users' hands.
                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.2
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch, master-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index file size grow not so fast with old data (old btree nodes basically). This behaviour is already done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs). Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process basically) and the view server more efficient since the 2 sides spend less time block on reading from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira