You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by "Filipe Manana (JIRA)" <ji...@apache.org> on 2010/02/05 00:19:28 UTC

[jira] Created: (COUCHDB-639) Pull replication should profit of compressed attachments for higher performance

Pull replication should profit of compressed attachments for higher performance
-------------------------------------------------------------------------------

                 Key: COUCHDB-639
                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
             Project: CouchDB
          Issue Type: Improvement
          Components: Replication
    Affects Versions: 0.11
         Environment: trunk
            Reporter: Filipe Manana
         Attachments: pull-rep-att-comp.patch

Currently, when doing a pull replication where there are docs at the source DB with compressed attachments:

1) The source decompresses the attachment before sending it to the target DB
2) The target compresses the attachment before storing it

Clearly, a waste of CPU, bandwidth and disk IO.

The following patch fixes the issue. JavaScript test included.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: rep-att-comp-and-multipart-trunk-3.patch

Just a minor update in case errors are found when replicating docs.

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-3.patch, rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment:     (was: pull-rep-att-comp-2.patch)

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: rep-att-comp-and-multipart-trunk.patch

Just keeping the patch up to date with r917608

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835179#action_12835179 ] 

Filipe Manana commented on COUCHDB-639:
---------------------------------------

Anyone looking into this?

This also fixes the issue of COUCHDB-163 as far as I understood.

cheers

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment:     (was: rep-att-comp-and-multipart-trunk.patch)

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski closed COUCHDB-639.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.12

Great patch, Filipe.  Thanks!

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>             Fix For: 0.12
>
>         Attachments: rep-att-comp-and-multipart-trunk-2.patch, rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: rep-att-comp-and-multipart-trunk-2.patch

Ok, now I'm no longer using a replication specific URI API. Definitely, it was a bad idea.

For multipart docs, I simply use now a PUT /somedb/docId?new_edits=false.  This is an API that exists already. Dunno why, but previously I was associating new_edits to _bulk_docs only.

So, for docs without attachments, I upload them to the remote target DB using _bulk_docs, exactly like before. For docs with attachments, I upload them using PUT /somedb/docId?new_edits=false and sending the doc as a multipart stream.

Simple enough imho.



> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-2.patch, rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment:     (was: rep-att-comp-and-multipart-trunk-2.patch)

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-3.patch, rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment:     (was: rep-att-comp-and-multipart-trunk-4.patch)

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: rep-att-comp-and-multipart-trunk.patch

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: pull-rep-att-comp-2.patch, pull-rep-att-comp.patch, rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: rep-att-comp-and-multipart-trunk.patch

A 1 line change. Added missing call to couch_util:url_encode/1 with a doc id as the parameter.

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: rep-att-comp-and-multipart-trunk-4.patch

Just added an Etap test and removed the JS tests section relative to this ticket.

With the Etap test it's more reliable to detect if the replicated attachments were in fact transfered in compressed form, since with Firefox we can't control the value of the header "Accept-Encoding"

@Chris Is it all ok now with this patch?

@Adam Any feedback?

Or feedback from anyone else.

cheers

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-4.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment:     (was: rep-att-comp-and-multipart-trunk.patch)

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-4.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835820#action_12835820 ] 

Filipe Manana commented on COUCHDB-639:
---------------------------------------

Hi Chris,

That is in fact the part I don't like, exposing the _bulk_doc_rep. I did it because when using the doc multipart streamer, we can't use the same http body to include other docs (at least not as far as I know). So _bulk_docs would be no longer _bulk_docs but _bulk_doc (singular).

The alternative I see, is to add a case clause in _bulk_docs, like:

case HttpHeaderContentType of
"multipart/related" ->
   % do the stuff of _bulk_doc_rep (new_edits is false, call update_docs with "replicated_changes")
_Else ->
   % ....
end

This probably, looks better?

It should copy new and old attachments (doesn't matter if they're compressed or not). Hummm, what is there suspicious about that?

cheers

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838836#action_12838836 ] 

Filipe Manana commented on COUCHDB-639:
---------------------------------------

Can't find which file?

You should do:

$ cd your_git_repo_path
$ git apply rep-att-comp-and-multipart-trunk-4.patch



> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-4.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment:     (was: pull-rep-att-comp.patch)

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment:     (was: rep-att-comp-and-multipart-trunk-3.patch)

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-4.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Description: 
At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.

The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).

The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication

JavaScript tests included.

Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.



  was:
Currently, when doing a pull replication where there are docs at the source DB with compressed attachments:

1) The source decompresses the attachment before sending it to the target DB
2) The target compresses the attachment before storing it

Clearly, a waste of CPU, bandwidth and disk IO.

The following patch fixes the issue. JavaScript test included.

        Summary: Make replication profit of attachment compression and improve push replication for large attachments  (was: Pull replication should profit of compressed attachments for higher performance)

At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.

The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).

The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication

JavaScript tests included.

Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.



> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: pull-rep-att-comp-2.patch, pull-rep-att-comp.patch, rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Pull replication should profit of compressed attachments for higher performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: pull-rep-att-comp.patch

> Pull replication should profit of compressed attachments for higher performance
> -------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: pull-rep-att-comp.patch
>
>
> Currently, when doing a pull replication where there are docs at the source DB with compressed attachments:
> 1) The source decompresses the attachment before sending it to the target DB
> 2) The target compresses the attachment before storing it
> Clearly, a waste of CPU, bandwidth and disk IO.
> The following patch fixes the issue. JavaScript test included.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835779#action_12835779 ] 

Chris Anderson commented on COUCHDB-639:
----------------------------------------

This patch applies cleanly and the tests are passing. I'm also +1 on the feature (and I sure wouldn't mind committing this before 0.11 is tarballed as the code changes are enough that it might make backporting fixes to 0.11 a pain later on.)

However, I'm not 100% sure about _bulk_docs_rep.

I'm concerned about having a separate endpoint designed for replication (gives the wrong idea to people -- that replication is special. Replication is just another HTTP client.)

I'm also concerned about the implementation (does this copy only new attachments, or does it copy all attachments?) I'd like it of Adam or someone else familiar with the replicator could review this patch. (And apply it if you think it is right.)



> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838855#action_12838855 ] 

Filipe Manana commented on COUCHDB-639:
---------------------------------------

Just tested it with the latest trunk rev and found no problems:

fdmanana@core2duo:~/git/couchdb$ git log -1
commit 4cefde131f1992c70f66c527435a715290174423
Author: Mark Hammond <mh...@apache.org>
Date:   Fri Feb 26 01:32:21 2010 +0000

    generate .sha file for windows binary; ensure md5/sha use rel paths
    
    git-svn-id: https://svn.apache.org/repos/asf/couchdb/trunk@916528 13f79535-47bb-0310-9956-ffa450edef68
fdmanana@core2duo:~/git/couchdb$ 

fdmanana@core2duo:~/git/couchdb$ git apply --index --reject ../rep-att-comp-and-multipart-trunk-4.patch
Checking patch src/couchdb/couch_db.erl...
Checking patch src/couchdb/couch_doc.erl...
Checking patch src/couchdb/couch_httpd_db.erl...
Checking patch src/couchdb/couch_rep_att.erl...
Checking patch src/couchdb/couch_rep_reader.erl...
Checking patch src/couchdb/couch_rep_writer.erl...
Checking patch test/etap/170-replication-attachment-comp.t...
Checking patch test/etap/Makefile.am...
Applied patch src/couchdb/couch_db.erl cleanly.
Applied patch src/couchdb/couch_doc.erl cleanly.
Applied patch src/couchdb/couch_httpd_db.erl cleanly.
Applied patch src/couchdb/couch_rep_att.erl cleanly.
Applied patch src/couchdb/couch_rep_reader.erl cleanly.
Applied patch src/couchdb/couch_rep_writer.erl cleanly.
Applied patch test/etap/170-replication-attachment-comp.t cleanly.
Applied patch test/etap/Makefile.am cleanly.
fdmanana@core2duo:~/git/couchdb$ 

All tests are passing also.

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-4.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: rep-att-comp-and-multipart-trunk-2.patch

Updated the patch according to Adam's review (through IRC)

1) use lists:partition instead of lists:foldl in couch_rep_writer
2) rename test case file from 170-* to 113-*

cheers

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-2.patch, rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Posted by "sulantha sanjeewa (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838833#action_12838833 ] 

sulantha sanjeewa commented on COUCHDB-639:
-------------------------------------------

Can any one please tell me how to install this patch.. ASAP please.. i have version 0.11.. it says it can't find the file

> Make replication profit of attachment compression and improve push replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: rep-att-comp-and-multipart-trunk-4.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + for example). Currently it sends the attachments in-lined in the respective JSON doc. Not only this requires too much ram memory, it also wastes too much CPU time doing the base64 encoding of the attachment (and also a decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both issues. Docs containing attachments are now streamed to the target remote DB using the multipart doc streaming feature provided by couch_doc.erl, and compressed attachments are not uncompressed and re-compressed during the replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Pull replication should profit of compressed attachments for higher performance

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-639:
----------------------------------

    Attachment: pull-rep-att-comp-2.patch

Just eliminated a useless line that was adding "Accept-Encoding: gzip" to the attachment streaming request. This header is set by default in the definition of #http_db in couch_db.hrl

> Pull replication should profit of compressed attachments for higher performance
> -------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: pull-rep-att-comp-2.patch, pull-rep-att-comp.patch
>
>
> Currently, when doing a pull replication where there are docs at the source DB with compressed attachments:
> 1) The source decompresses the attachment before sending it to the target DB
> 2) The target compresses the attachment before storing it
> Clearly, a waste of CPU, bandwidth and disk IO.
> The following patch fixes the issue. JavaScript test included.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.