You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Filipe Manana (JIRA)" <ji...@apache.org> on 2011/02/02 05:42:29 UTC

[jira] Created: (COUCHDB-1054) Better couch_log performance

Better couch_log performance
----------------------------

                 Key: COUCHDB-1054
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
             Project: CouchDB
          Issue Type: Improvement
            Reporter: Filipe Manana
            Assignee: Filipe Manana


Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:

http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Sebastian Cohnen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990007#comment-12990007 ] 

Sebastian Cohnen commented on COUCHDB-1054:
-------------------------------------------

Although I don't exactly know what Filipe improved here, this is not about the memory consumption problems with logging on debug e.g., right?

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054-3.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Randall Leeds (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989551#comment-12989551 ] 

Randall Leeds commented on COUCHDB-1054:
----------------------------------------

Reviewed the patch and it looks good. Nice performance win!
Would if we added the timestamp in get_log_message?
It might make some log messages appear out of order, but it would keep the timestamp closer to the real time of the event.
Could the timestamps ever provide more information about inter-related gen_server crashes? Information which is not clear from the backtrace, the last message received, or the supervision tree?
Also, fewer io:format maybe creates less garbage?
I don't feel strongly either way, but I thought I'd pick it out and ask your opinion since the rest of the patch is unquestionably solid.

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989758#comment-12989758 ] 

Filipe Manana commented on COUCHDB-1054:
----------------------------------------

Volker, I don't fully understand your comment. What do you mean with "I'm definitely for timestamps"? Timestamps were always present. Or do you mean that having apparently out of order timestamps is not an issue for you?

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Jan Lehnardt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990394#comment-12990394 ] 

Jan Lehnardt commented on COUCHDB-1054:
---------------------------------------

Filipe, looks good, no objections for 1.1.x

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054-3.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989647#comment-12989647 ] 

Adam Kocoloski commented on COUCHDB-1054:
-----------------------------------------

Nice find.  A few suggestions:

1) Have you tried using io:put_chars/2 instead of io:format in the log() function?  Now that the arguments are simple strings we don't need all the printf goop there.

2) If the output of get_log_message() is longer than 64 characters (and it usually is) it may make sense to do a list_to_binary there and send a refc binary to the logger so we don't copy the data.

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990146#comment-12990146 ] 

Adam Kocoloski commented on COUCHDB-1054:
-----------------------------------------

Hi Sebastian, I expect it will help with that somewhat.

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054-3.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (COUCHDB-1054) Better couch_log performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-1054:
-----------------------------------

    Attachment: COUCHDB-1054-2.patch

Simpler version of the patch.
Also gets rid of the useless all possible EOLs substitution with CRLF.

http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf0342154029698

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990147#comment-12990147 ] 

Adam Kocoloski commented on COUCHDB-1054:
-----------------------------------------

@Filipe no objections from me.

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054-3.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (COUCHDB-1054) Better couch_log performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-1054:
-----------------------------------

    Attachment: COUCHDB-1054-3.patch

Updated patch.
Adding the timestamp outside the event handler and using io:put_chars/[1,2].

Relaximation graph:

http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf034215402a52e

While running this test (100 readers, 100 writers), all log entries have monotonically increasing timestamps. Their resolution is at the second level, so it shouldn't be very common to get 2 consecutive log messages without increasing  timestamps.

Any objection for 1.1.x inclusion (besides trunk)?

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054-3.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Randall Leeds (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991306#comment-12991306 ] 

Randall Leeds commented on COUCHDB-1054:
----------------------------------------

Just saw this got committed but I had one more thought: do we save any extra conversion and binary creation by using the binary of ConsoleMsg as the second element of FileMsg?

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.1, 1.2
>
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054-3.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Volker Mische (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989751#comment-12989751 ] 

Volker Mische commented on COUCHDB-1054:
----------------------------------------

I'm definitely for timestamps. When you have a long running session with some event you are interested in in between, it's easier to find the corresponding message (even if they are out of order).

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (COUCHDB-1054) Better couch_log performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana resolved COUCHDB-1054.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2
                   1.1

Applied to trunk and 1.1.x

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>             Fix For: 1.1, 1.2
>
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054-3.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (COUCHDB-1054) Better couch_log performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-1054:
-----------------------------------

    Attachment: COUCHDB-1054.patch

> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989743#comment-12989743 ] 

Filipe Manana commented on COUCHDB-1054:
----------------------------------------

Yes, the addition of the timestamp is not externalized to avoid having strange logs, with later timestamps coming before earlier ones. If externalized, it would likely confuse many users. However I don't mind at all about it.

Is everyone ok with this?


At some point I had the message converted to a binary with an iolist_to_binary call, it's certainly difficult to measure the impact, but I'm not sure what is more expensive: converting to a binary or copying an iolist.
I didn't know about io:put_chars, but it seems to accept binaries and iolists besides charlists (the docs only mention charlists). I'll definitely use it.



> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (COUCHDB-1054) Better couch_log performance

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989755#comment-12989755 ] 

Adam Kocoloski commented on COUCHDB-1054:
-----------------------------------------

Hi Volker, Filipe was only addressing whether the timestamp should be inserted by the client process or by the logger itself.  If done by the client process it's possible that timestamps may not monotonically increase.  We'll certainly continue to insert timestamps in the log files.


> Better couch_log performance
> ----------------------------
>
>                 Key: COUCHDB-1054
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1054
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>         Attachments: COUCHDB-1054-2.patch, COUCHDB-1054.patch
>
>
> Building the messages to write to the console and the log file can be done outside the couch_log gen_event. This significantly increases the parallelism when many processes log messages. The following relaximation test graph shows a throughput increase using the attached patch against current trunk:
> http://graphs.mikeal.couchone.com/#/graph/0379dbdaef29b1c0fbf03421540243f7

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira