You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Adam Kocoloski (JIRA)" <ji...@apache.org> on 2010/09/16 22:32:32 UTC

[jira] Created: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

out of memory crash when compacting document with lots of edit branches
-----------------------------------------------------------------------

                 Key: COUCHDB-888
                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
             Project: CouchDB
          Issue Type: Bug
          Components: Database Core
            Reporter: Adam Kocoloski
         Attachments: key_tree_backtrace.txt.gz

I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.

I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Joseph Davis updated COUCHDB-888:
--------------------------------------

    Skill Level: Committers Level (Medium to Hard)

> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>            Assignee: Damien Katz
>             Fix For: 1.1
>
>         Attachments: 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic-v2.patch, 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic.patch, key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-888:
-----------------------------------

    Fix Version/s: 1.1

> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>            Assignee: Damien Katz
>             Fix For: 1.1
>
>         Attachments: 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic-v2.patch, 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic.patch, key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski reassigned COUCHDB-888:
--------------------------------------

    Assignee: Damien Katz

Damien, can you review this patch when you have a chance?

> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>            Assignee: Damien Katz
>         Attachments: 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic-v2.patch, 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic.patch, key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-888:
-----------------------------------

    Attachment: 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic.patch

This patch against trunk should fix the problem.

I realized the map traversal itself was not a major issue since the revision tree is mapped every time a document is read or written.  I figured the problem must be the specific map function used in a traversal in the compactor code.  I looked at copy_rev_tree_attachments and realized that the compactor loaded document bodies for every leaf of 1000 documents into memory simultaneously.  When there are no edit conflicts this is fine, but if each document has ~100 conflicts we are effectively loading 100k document bodies into memory.

A BigCouch version of this patch was able to compact our problem database with no appreciable memory usage.  `make check` and the compact portion of the Futon suite pass.

There should be no difference in indexing performance for databases without attachments.  I haven't tested the effect of eliminating the "contiguous document bodies" optimization on the indexing time of a database with lots of attachments.  If it turns out to be a big regression we could consider tracking the total number of document bodies (including conflicts) in the accumulator that determines when to flush to disk.  However, I think this version is quite a bit simpler, so I'd want to see a benchmark that proves we really have something to gain there.


> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>         Attachments: 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic.patch, key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910328#action_12910328 ] 

Adam Kocoloski commented on COUCHDB-888:
----------------------------------------

Hi Damien, I guess the implications of limiting the number of edit branches would be pretty serious, right?  When we stem a branch we run the risk of introducing spurious conflicts, but if we limited the number of branches it seems like we'd be forced to throw away data.

I agree that if the tree is memory-resident there will always be a point at which it will crash the VM, but if we can architect things so that point only occurs when the tree itself requires gigabytes of memory we'd be in pretty good shape.  I have the sense that the offending tree is really only ~10 MB, but that the traversal does some N^2 explosion.  I could be wrong.

Disk-resident would be really cool if the most common interactions still only require one lookup.

> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>         Attachments: key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-888:
-----------------------------------

    Attachment: 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic-v2.patch

The first patch I uploaded was not correct.  The compactor needs to replace branch revisions with ?REV_MISSING in the target revision tree.  This is a corrected patch.

> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>         Attachments: 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic-v2.patch, 0001-fix-OOME-when-compacting-a-DB-with-many-edit-conflic.patch, key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-888:
-----------------------------------

    Attachment: key_tree_backtrace.txt.gz

Backtrace of the offending process.

> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>         Attachments: key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-888) out of memory crash when compacting document with lots of edit branches

Posted by "Damien Katz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910308#action_12910308 ] 

Damien Katz commented on COUCHDB-888:
-------------------------------------

Adam, perhaps we should limit the # of conflicts allowed, similar to how we limit total revs? Not sure of implications, but it's always possible to generate an unlimited # of conflicts that will consume all the memory.

Another option is to keep the rev trees as disk based structures, to avoid loading them completely into memory at anytime.

Also, a temporary solution here is to purge the conflicts down to a manageable #.

> out of memory crash when compacting document with lots of edit branches
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-888
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-888
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Adam Kocoloski
>         Attachments: key_tree_backtrace.txt.gz
>
>
> I have a database which will crash CouchDB if I try to compact it.  It causes beam.smp to use all the memory on the server.  I caught it in the act one time and sorted the Erlang processes by memory usage.  The process spawned to do the compaction turned out to be the culprit.  I took a backtrace of the process and found that it was mapping a very large revision tree.  I have reason to believe that the document has a large number (~1000s) of edit conflicts.
> I think part of the problem may be that the recursion in couch_key_tree:map_simple requires each stack space for every iteration.  I'm not sure if it's possible to rewrite the algorithm in a more memory-friendly way given the current tree structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.