You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/02/26 08:13:21 UTC

[jira] Created: (CASSANDRA-2252) off-heap memtables

off-heap memtables
------------------

                 Key: CASSANDRA-2252
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis
             Fix For: 0.8


The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment:     (was: 2252-v1.tgz)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v1.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment:     (was: 2252-alternate-v1.tgz)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999785#comment-12999785 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

improved handling of counterupdatecolumns and added special case for empty buffers

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0002-add-off-heap-MemtableAllocator-support.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016179#comment-13016179 ] 

Stu Hood commented on CASSANDRA-2252:
-------------------------------------

If it is possible to merge this, but to leave it disabled for the 0.8 release, we would be in a much happier place: it is a very painful patch to maintain.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2252) arena allocation for memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Remaining Estimate:     (was: 0.4h)
     Original Estimate:     (was: 0.4h)
               Summary: arena allocation for memtables  (was: off-heap memtables)

> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018160#comment-13018160 ] 

Stu Hood commented on CASSANDRA-2252:
-------------------------------------

This will need a rebase post-2006. In particular, we'll need to include the bytes allocated by the allocator in the live size, and make sure the shared buffers/slabs are excluded by JAMM.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091196#comment-13091196 ] 

Yang Yang commented on CASSANDRA-2252:
--------------------------------------

SlabAllocator:

    private void tryRetireRegion(Region region)
    {
        if (currentRegion.compareAndSet(region, null))
        {
            filledRegions.add(region);
        }
    }


could you please explain why we need to add them to "filledRegions"? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go  and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ??

just to confirm my thoughts, I looked at the HBase implementation:
./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java


  /**
   * Try to retire the current chunk if it is still
   * <code>c</code>. Postcondition is that curChunk.get()
   * != c
   */
  private void tryRetireChunk(Chunk c) {
    @SuppressWarnings("unused")
    boolean weRetiredIt = curChunk.compareAndSet(c, null);
    // If the CAS succeeds, that means that we won the race
    // to retire the chunk. We could use this opportunity to
    // update metrics on external fragmentation.
    //
    // If the CAS fails, that means that someone else already
    // retired the chunk for us.
  }

it does not tie it to a region list .



the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) still occupy memory.  --- well  if the purpose is to count the size() held in allocator, should we use weak references?


> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 0002-add-off-heap-MemtableAllocator-support.txt
                0001-add-MemtableAllocator.txt

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091216#comment-13091216 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

I see what you mean: you want to allow a Region to "die" if all the buffers it was responsible for have been overwritten and thus the Region is no longer needed for flush.  In that case, weak references sound reasonable.  However, I'm not sure it would make any difference in practice since we allocate row keys into the regions, as well.

> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999878#comment-12999878 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

I'll look deeper later, but one quick comment:

bq. Original uses JNA for direct memory access, alternate uses ByteBuffer.allocateDirect

allocateDirect pointers are not freed until CMS runs finalizers, so this could result in much larger space used than intended.

The JNA-based code isn't perfect though either, I've seen heisenbug segfaults.  I'm sure it's fixable though.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v1.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Reviewer: stuhood

Also, I would like to know who the clown is that estimated this as 0.4 hours. :)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 0002-add-off-heap-MemtableAllocator-support.txt
                0001-wip.txt

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-wip.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Issue Comment Edited: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002414#comment-13002414 ] 

Stu Hood edited comment on CASSANDRA-2252 at 3/4/11 3:18 AM:
-------------------------------------------------------------

I'd like to go ahead and merge these patches together onto jbellis's version. Things that I will cherry pick from the alternate patch:
* A generic Allocator interface, rather than referring to Memtables. There are various other places that we need to apply slabbing
* The additional counter tests we added
* Using the slab allocator's allocation count to determine throughput for memtables
* Slabbing of keys

----

The other realization we've had about slab allocation is that unless _all_ sources of fragmentation are eliminated, slabbing actually causes promotion failures to happen earlier, since it is harder to promote a slab into a fragmented oldgen. The other sources of fragmentation we suspect are:
* IndexSummaries (easily slabbed)
* the key and row caches (row cache tackled in CASSANDRA-1969)

These can probably be defragged in separate tickets, as long as we commit to fixing them.

      was (Author: stuhood):
    I'd like to go ahead and merge these patches together onto jbellis's version. Things that I will cherry pick from the alternate patch:
* A generic Allocator interface, rather than referring to Memtables. There are various other places that we need to apply slabbing
* The additional counter tests we added
* Using the slab allocator's allocation count to determine throughput for memtables

----

The other realization we've had about slab allocation is that unless _all_ sources of fragmentation are eliminated, slabbing actually causes promotion failures to happen earlier, since it is harder to promote a slab into a fragmented oldgen. The other sources of fragmentation we suspect are:
* IndexSummaries (easily slabbed)
* the key and row caches (row cache tackled in CASSANDRA-1969)

These can probably be defragged in separate tickets, as long as we commit to fixing them.
  
> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999734#comment-12999734 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

Patches to add MemtableAllocator (based on Todd's MemStoreLAB) and off-heap allocation via FreeableMemory.

I gave up on trying to do the allocation for this and interning during deserialize; the code got very tangled and felt fragile.  Instead, we copy during Memtable.put, which feels right since there is no way to screw ourselves by forgetting to copy in a new code path.  I think I'm willing to bite the extra copy for that.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-wip.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091225#comment-13091225 ] 

Yang Yang commented on CASSANDRA-2252:
--------------------------------------

my line of thought comes from update-heavy workload (counters etc). conceivably (putting the row key issue aside for a while)  one region would contain bytebuffer values of similar age. as more updates come in, all the columns in older regions are likely to have all died out, thus allowing us to free the entire region before flushing happens.  


coming back to the row key issue, in the original slab allocator paper ( Jeff Bonwick ) , a slab contains strictly the same objects, which imply that they die at roughly the same time.   if they don't, then yes, in our case, slab has the disadvantage that an entire slab (2MB worth of mem) is held simply because a row key in it is not dead yet.  so to overcome this disadvantage, we probably need to further distinguish between object types to be allocated in the slab: this JIRA (same as HBase code) distinguishes between all the allocations between different memtables, to work better with update-heavy traffic, we need to distinguish between row keys and column values (they have different life times)




> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0001-add-MemtableAllocator.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058733#comment-13058733 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

JNA 3.3.0 has been released including the http://java.net/jira/browse/JNA-179 fixes.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) off-heap memtables

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085330#comment-13085330 ] 

Brandon Williams commented on CASSANDRA-2252:
---------------------------------------------

bq. Also, I would like to know who the clown is that estimated this as 0.4 hours. :)

Probably some suit. ;)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002414#comment-13002414 ] 

Stu Hood commented on CASSANDRA-2252:
-------------------------------------

I'd like to go ahead and merge these patches together onto jbellis's version. Things that I will cherry pick from the alternate patch:
* A generic Allocator interface, rather than referring to Memtables. There are various other places that we need to apply slabbing
* The additional counter tests we added
* Using the slab allocator's allocation count to determine throughput for memtables

----

The other realization we've had about slab allocation is that unless _all_ sources of fragmentation are eliminated, slabbing actually causes promotion failures to happen earlier, since it is harder to promote a slab into a fragmented oldgen. The other sources of fragmentation we suspect are:
* IndexSummaries (easily slabbed)
* the key and row caches (row cache tackled in CASSANDRA-1969)

These can probably be defragged in separate tickets, as long as we commit to fixing them.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091229#comment-13091229 ] 

Yang Yang commented on CASSANDRA-2252:
--------------------------------------

actually even if we don't do the further optimization suggested in the last comment (separate rowkey and column value into different slab allocators), it would still very much likely work better and kill off some dead regions. 

let's say a row/column is continually updated 1000 times ,  and 100 column value fit into 2MB, then to do these 1000 updates, we allocate 10 regions, only the first region would contain the row key, and finally all the 8 regions in the middle would die, the first one remains due to the row key, and the last remains due to the latest (live) column value

> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092600#comment-13092600 ] 

Yang Yang commented on CASSANDRA-2252:
--------------------------------------

"if it's a memtable-related operation" ---- aren't the CounterContext's finally inserted into the CounterColumns, hence the Memtable too ?

for example:

CounterMutation.computeShardMerger() ==> CounterColumn.computeOldShardMerger()
===> ByteBuffer contextManager.computeOldShardMerger {
      ......
       ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance);
         ....
     return merger.context;
}

the "merger.context" is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger()



Thanks
Yang


> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment: merged-2252.tgz

Rebased for trunk. I took the easy way out and just included the Allocator's allocation count in the memtable live size: this will overcount, but it accounts for fragmentation due to updates. I've started a patch for JAMM to add a mode that will allow us to ignore shared buffers: see [jamm/buffer-behavior|https://github.com/stuhood/jamm/commits/buffer-behavior].

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092608#comment-13092608 ] 

Yang Yang commented on CASSANDRA-2252:
--------------------------------------

cool, actually after some thought, I think we need to put more care to utilizing SlabAllocator for counters:

I realized this when u said "only if it's a memtable-related operation", this would be very true for some temp variable ByteBuffers, which are thrown away immediately, and hence get relaimed in the new gen GC, and never go into old gen.

for counters, the column values (which contain the CounterContext) change a lot, if we assume that the value of each counter is updated 1000 times during the life time of  a memtable before being flushed, then if you look at a typical 2MB slab allocated out, 99.9% of the buffers it contains are going to be non-reachable and GC'ed before flushing. so when the 0.1% buffer is promoted, it occupies 2MB space instead of its actual size, which would be more waste than the possible fragmentation problem it causes.

so in this case (or, more generally, all cases where update is more often), using HeapAllocator may be better.


Thanks
Yang

> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0001-wip.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment: 2252-v1.tgz

We've been developing a very similar patch in parallel: attached as 2252-alternate-v1.tgz.

The key differences seem to be:
* Original removes interning from deserialization, alternate encapsulates it
* Alternate has support for allocating reconciled counter columns (the result of merging counters) in slabs
* Original uses JNA for direct memory access, alternate uses ByteBuffer.allocateDirect
* Alternate uses the allocation count of the Memtable's allocator to measure throughput for thresholds (less accurate, but necessary to track the true memory usage of counters)

We've tested the alternate patch in a production setting with counter mutations to multiple column families: it reduced our fragmentation enough to double the time between promotion failures, but it does not fully eliminate fragmentation.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v1.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0002-add-off-heap-MemtableAllocator-support.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 0002-add-off-heap-MemtableAllocator-support.txt
                0001-add-MemtableAllocator.txt

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002440#comment-13002440 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

sounds reasonable

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082909#comment-13082909 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

bq. Adding region recycling post-flush

Of course this has the same reference counting problem as JNA free does: we can't re-use the buffer until we're sure nobody's reading from it anymore.

I'm inclined to just do in-heap buffers for the memtables to start with.  Unlike cached rows, we're virtually guaranteed contention on the memtable objects if we try to do AtomicInteger-based reference counting, and sharded threadlocal counting is more of a pita than I want to tackle at the moment.

I suspect that, like HBase, we'll realize substantial wins from area allocation even on-heap.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092600#comment-13092600 ] 

Yang Yang edited comment on CASSANDRA-2252 at 8/29/11 2:33 AM:
---------------------------------------------------------------

"if it's a memtable-related operation" ---- but CounterContext.allocate produces a ByteBuffer , some of which goes into CounterColumn, hence Memtable, it seems.

for example:

CounterMutation.computeShardMerger() ==> CounterColumn.computeOldShardMerger()
===> ByteBuffer contextManager.computeOldShardMerger {
      ......
       ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance);
         ....
     return merger.context;
}

the "merger.context" is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger()



Thanks
Yang


      was (Author: yangyangyyy):
    "if it's a memtable-related operation" ---- CounterContext.allocate produces a ByteBuffer , some of which goes into CounterColumn, hence Memtable, it seems.

for example:

CounterMutation.computeShardMerger() ==> CounterColumn.computeOldShardMerger()
===> ByteBuffer contextManager.computeOldShardMerger {
      ......
       ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance);
         ....
     return merger.context;
}

the "merger.context" is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger()



Thanks
Yang

  
> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091196#comment-13091196 ] 

Yang Yang edited comment on CASSANDRA-2252 at 8/25/11 6:32 PM:
---------------------------------------------------------------

SlabAllocator:

    private void tryRetireRegion(Region region)
    {
        if (currentRegion.compareAndSet(region, null))
        {
            filledRegions.add(region);
        }
    }


could you please explain why we need to add them to "filledRegions"? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go  and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ??

just to confirm my thoughts, I looked at the HBase implementation:
./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java


  /**
   * Try to retire the current chunk if it is still
   * <code>c</code>. Postcondition is that curChunk.get()
   * != c
   */
  private void tryRetireChunk(Chunk c) {
    @SuppressWarnings("unused")
    boolean weRetiredIt = curChunk.compareAndSet(c, null);
    // If the CAS succeeds, that means that we won the race
    // to retire the chunk. We could use this opportunity to
    // update metrics on external fragmentation.
    //
    // If the CAS fails, that means that someone else already
    // retired the chunk for us.
  }

it does not tie it to a region list .



the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) always occupy memory.  --- well  if the purpose is to count the size() held in allocator, should we just keep a int var of total size  , or use weak references?


      was (Author: yangyangyyy):
    SlabAllocator:

    private void tryRetireRegion(Region region)
    {
        if (currentRegion.compareAndSet(region, null))
        {
            filledRegions.add(region);
        }
    }


could you please explain why we need to add them to "filledRegions"? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go  and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ??

just to confirm my thoughts, I looked at the HBase implementation:
./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java


  /**
   * Try to retire the current chunk if it is still
   * <code>c</code>. Postcondition is that curChunk.get()
   * != c
   */
  private void tryRetireChunk(Chunk c) {
    @SuppressWarnings("unused")
    boolean weRetiredIt = curChunk.compareAndSet(c, null);
    // If the CAS succeeds, that means that we won the race
    // to retire the chunk. We could use this opportunity to
    // update metrics on external fragmentation.
    //
    // If the CAS fails, that means that someone else already
    // retired the chunk for us.
  }

it does not tie it to a region list .



the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) always occupy memory.  --- well  if the purpose is to count the size() held in allocator, should we use weak references?

  
> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091196#comment-13091196 ] 

Yang Yang edited comment on CASSANDRA-2252 at 8/25/11 6:26 PM:
---------------------------------------------------------------

SlabAllocator:

    private void tryRetireRegion(Region region)
    {
        if (currentRegion.compareAndSet(region, null))
        {
            filledRegions.add(region);
        }
    }


could you please explain why we need to add them to "filledRegions"? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go  and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ??

just to confirm my thoughts, I looked at the HBase implementation:
./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java


  /**
   * Try to retire the current chunk if it is still
   * <code>c</code>. Postcondition is that curChunk.get()
   * != c
   */
  private void tryRetireChunk(Chunk c) {
    @SuppressWarnings("unused")
    boolean weRetiredIt = curChunk.compareAndSet(c, null);
    // If the CAS succeeds, that means that we won the race
    // to retire the chunk. We could use this opportunity to
    // update metrics on external fragmentation.
    //
    // If the CAS fails, that means that someone else already
    // retired the chunk for us.
  }

it does not tie it to a region list .



the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) always occupy memory.  --- well  if the purpose is to count the size() held in allocator, should we use weak references?


      was (Author: yangyangyyy):
    SlabAllocator:

    private void tryRetireRegion(Region region)
    {
        if (currentRegion.compareAndSet(region, null))
        {
            filledRegions.add(region);
        }
    }


could you please explain why we need to add them to "filledRegions"? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go  and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ??

just to confirm my thoughts, I looked at the HBase implementation:
./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java


  /**
   * Try to retire the current chunk if it is still
   * <code>c</code>. Postcondition is that curChunk.get()
   * != c
   */
  private void tryRetireChunk(Chunk c) {
    @SuppressWarnings("unused")
    boolean weRetiredIt = curChunk.compareAndSet(c, null);
    // If the CAS succeeds, that means that we won the race
    // to retire the chunk. We could use this opportunity to
    // update metrics on external fragmentation.
    //
    // If the CAS fails, that means that someone else already
    // retired the chunk for us.
  }

it does not tie it to a region list .



the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) still occupy memory.  --- well  if the purpose is to count the size() held in allocator, should we use weak references?

  
> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0001-add-MemtableAllocator.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000012#comment-13000012 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

bq. The JNA-based code isn't perfect though either, I've seen heisenbug segfaults

Described the problem at http://java.net/jira/browse/JNA-179 -- we'll need a patch to JNA to make it possible to do what we want here.  So for now we should stick to allocating on-heap which will at least give us the reduced fragmentation benefits.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v1.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092574#comment-13092574 ] 

Yang Yang edited comment on CASSANDRA-2252 at 8/29/11 12:17 AM:
----------------------------------------------------------------

hi Jonathan:

I checked the counters code, they currently use HeapAllocator .  what is the reason we don't yet use SlabAllocator for Counters?

Thanks


also I put the idea to use 2 SlabAllocators (one for those buffers with long life, one for those short-lived) in https://github.com/yangyangyyy/cassandra/commit/bc017835c64240e58c0c51b2d5f8793f3c7f3a76

https://github.com/yangyangyyy/cassandra/commit/8431ca1b9586086073e6b81d346a06e8172a97e7
maybe it is useful



      was (Author: yangyangyyy):
    hi Jonathan:

I checked the counters code, they currently use HeapAllocator .  what is the reason we don't yet use SlabAllocator for Counters?

Thanks


also I put the idea to use 2 SlabAllocators (one for those buffers with long life, one for those short-lived) in https://github.com/yangyangyyy/cassandra/commit/bc017835c64240e58c0c51b2d5f8793f3c7f3a76
maybe it is useful


  
> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086173#comment-13086173 ] 

Stu Hood commented on CASSANDRA-2252:
-------------------------------------

+1

* The SlabAllocator.waste method creates a local 'region' variable, but doesn't use it

Other than that, this looks good. I won't have time to do any load testing this week, but I'll see if Chris is up for it.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092600#comment-13092600 ] 

Yang Yang edited comment on CASSANDRA-2252 at 8/29/11 2:33 AM:
---------------------------------------------------------------

"if it's a memtable-related operation" ---- CounterContext.allocate produces a ByteBuffer , some of which goes into CounterColumn, hence Memtable, it seems.

for example:

CounterMutation.computeShardMerger() ==> CounterColumn.computeOldShardMerger()
===> ByteBuffer contextManager.computeOldShardMerger {
      ......
       ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance);
         ....
     return merger.context;
}

the "merger.context" is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger()



Thanks
Yang


      was (Author: yangyangyyy):
    "if it's a memtable-related operation" ---- aren't the CounterContext's finally inserted into the CounterColumns, hence the Memtable too ?

for example:

CounterMutation.computeShardMerger() ==> CounterColumn.computeOldShardMerger()
===> ByteBuffer contextManager.computeOldShardMerger {
      ......
       ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance);
         ....
     return merger.context;
}

the "merger.context" is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger()



Thanks
Yang

  
> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment:     (was: merged-2252.tgz)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment:     (was: 2252-alternate-v2.tgz)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092595#comment-13092595 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

We only have a reference to the SA if it's a memtable-related operation, otherwise we have to use HA. Is that what you're referring to?

> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005945#comment-13005945 ] 

Stu Hood commented on CASSANDRA-2252:
-------------------------------------

It looks like MemtableAllocator was accidentally merged into trunk?

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment:     (was: merged-2252.tgz)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000163#comment-13000163 ] 

Stu Hood commented on CASSANDRA-2252:
-------------------------------------

Found a very obvious (and embarrassing) problem with the alternate patch... will post the fix tomorrow.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v1.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091209#comment-13091209 ] 

Yang Yang commented on CASSANDRA-2252:
--------------------------------------

Thanks Jonathan,
but why do we need them alive? 
for example

I create a 2MB region, which is carved out to 100 ByteBuffers, each of these ByteBuffers would point to the "data" of the Region, so as long as one of them is live, the bytes pointed to by Region.data is still in heap; and if these 100 ByteBuffers all die, isn't it our goal to free the 2MB region, since no one is using them??




> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Remaining Estimate: 0.4h
     Original Estimate: 0.4h

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0001-add-MemtableAllocator.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 0002-add-off-heap-MemtableAllocator-support.txt
                0001-add-MemtableAllocator.txt

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091202#comment-13091202 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

the purpose is we need to keep them alive until flush.  so weak would not work.

> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 0002-add-off-heap-MemtableAllocator-support.txt
                0001-add-MemtableAllocator.txt

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999755#comment-12999755 ] 

Todd Lipcon commented on CASSANDRA-2252:
----------------------------------------

Cool :)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065135#comment-13065135 ] 

Stu Hood commented on CASSANDRA-2252:
-------------------------------------

bq. And if they are so large they do not, then your rate of key allocation is glacial and again it shouldn't matter.
Compaction builds up an IndexSummary slowly enough that I theorized it might be causing fragmentation... didn't get a chance to prove it though.

bq. There is no logical unit of slabbing for key cache, we shouldn't be doing that at all.
Agreed. We actually ended up disabling the key cache and saw a nice boost in time-to-promotion-failure, but I would love to find an actual solution.

bq. Once you promoted a slab in old gen, it stays there, instead of being GC'd and replaced with a slab in new gen again.
The bookkeeping might be worth it, yes.


> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 2252-v3.txt

Rebased most of Stu's latest.  Changed getLiveSize to only add in waste from the allocator instead of double-counting the rest.  Enabled MemoryMeter.omitSharedBufferOverhead, which is super untested.

CFS.getColumnFamily was getting passed an allocator but this doesn't actually do anything.  (I removed the parameter.)  Was this supposed to be used during counter reconcile somehow?

Passing allocator throughout the CF+SC+[Super|Counter|Deleted|Expiring]Column heirarchy is ugly and error-prone.  (I found and fixed one error while rebasing, where a method taking an allocator parameter called the default addColumn, instead of the addColumn-with-allocator.)  Perhaps moving allocator to AbstractColumnContainer could fix this?

Not thrilled with the current alternatives for moving slabs off-heap.  Our options are to 
- use allocateDirect with all the problems that relying on finalization brings (see: CASSANDRA-2521), as well as requiring users to manually tune the JVM direct buffer ceiling (or face a flood of System.GC calls courtesy of allocateDirect when the ceiling is reached).
- use JNA + manual free, which will require doing reference counting for memtables the way we do for sstables post-CASSANDRA-2521.  Otherwise if a thread that had the memtable in its list of historical memtables to merge from tries to read, you segfault.  (This is NOT the same as the JNA 179 segfaults, which are fixed in 3.3.0.)
- stick with on-heap slabs

I'd say off-heap slabs don't matter that much but it would make the promotion failure problems you saw go away completely.

I'm also not a big fan of slabbing everything in sight.  Keys associated with memtables make sense (and is done in my rebase).  Row key and column names during sstable build, I'm skeptical of -- if your rows are small enough that they finish in before new -> old promotion, then it doesn't matter.  And if they are so large they do not, then your rate of key allocation is glacial and again it shouldn't matter.  But, if we WERE to slab these the right way to do it would be per-sstable not per IndexSummary.

There is no logical unit of slabbing for key cache, we shouldn't be doing that at all.

I have an alternative idea to reduce non-memtable fragmentation: Adding region recycling post-flush.  Once you promoted a slab in old gen, it stays there, instead of being GC'd and replaced with a slab in new gen again.

(This would also mitigate the main downside of allocateDirect.)

We'd still probably want some kind of delayed release of slabs so write load spikes don't permanently chew up your entire heap.


> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0002-add-off-heap-MemtableAllocator-support.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092574#comment-13092574 ] 

Yang Yang commented on CASSANDRA-2252:
--------------------------------------

hi Jonathan:

I checked the counters code, they currently use HeapAllocator .  what is the reason we don't yet use SlabAllocator for Counters?

Thanks


also I put the idea to use 2 SlabAllocators (one for those buffers with long life, one for those short-lived) in https://github.com/yangyangyyy/cassandra/commit/bc017835c64240e58c0c51b2d5f8793f3c7f3a76
maybe it is useful



> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086639#comment-13086639 ] 

Hudson commented on CASSANDRA-2252:
-----------------------------------

Integrated in Cassandra #1033 (See [https://builds.apache.org/job/Cassandra/1033/])
    Arena allocation for memtables
patch by jbellis and stuhood for CASSANDRA-2252

jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1158927
Files : 
* /cassandra/trunk/test/unit/org/apache/cassandra/db/ReadMessageTest.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/TimeSortTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/context/IContext.java
* /cassandra/trunk/src/java/org/apache/cassandra/utils/SlabAllocator.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java
* /cassandra/trunk/lib/jamm-0.2.2.jar
* /cassandra/trunk/test/unit/org/apache/cassandra/db/CounterMutationTest.java
* /cassandra/trunk/lib/jamm-0.2.4.jar
* /cassandra/trunk/src/java/org/apache/cassandra/db/ThreadSafeSortedColumns.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/index/keys/KeysSearcher.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/ColumnFamilyTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ISortedColumns.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Column.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/IColumn.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/context/CounterContextTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/context/CounterContext.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ExpiringColumn.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/marshal/CounterColumnType.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CounterUpdateColumn.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnSerializer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CounterColumn.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/DeletedColumn.java
* /cassandra/trunk/build.xml
* /cassandra/trunk/src/java/org/apache/cassandra/db/ArrayBackedSortedColumns.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/ArrayBackedSortedColumnsTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java
* /cassandra/trunk/src/java/org/apache/cassandra/utils/Allocator.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CounterMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Memtable.java
* /cassandra/trunk/conf/cassandra-env.sh
* /cassandra/trunk/src/java/org/apache/cassandra/io/util/ColumnSortedMap.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/StorageProxy.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/compaction/PrecompactedRow.java
* /cassandra/trunk/src/java/org/apache/cassandra/utils/HeapAllocator.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Row.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/IColumnSerializer.java
* /cassandra/trunk/src/java/org/apache/cassandra/utils/ByteBufferUtil.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/CounterColumnTest.java


> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment:     (was: 0002-add-off-heap-MemtableAllocator-support.txt)

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment: 2252-alternate-v1.tgz

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v1.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables

Posted by "Yang Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091225#comment-13091225 ] 

Yang Yang edited comment on CASSANDRA-2252 at 8/25/11 7:09 PM:
---------------------------------------------------------------

my line of thought comes from update-heavy workload (counters etc). conceivably (putting the row key issue aside for a while)  one region would contain bytebuffer values of similar age. as more updates come in, all the columns in older regions are likely to have all died out, thus allowing us to free the entire region before flushing happens.  


coming back to the row key issue, in the original slab allocator paper ( Jeff Bonwick ) , a slab contains strictly the same objects, which imply that they die at roughly the same time.   if they don't, then yes, in our case, slab has the disadvantage that an entire slab (2MB worth of mem) is held simply because a row key in it is not dead yet.  so to overcome this disadvantage, we probably need to further distinguish between object types to be allocated in the slab: this JIRA (same as HBase code) distinguishes between all the allocations between different memtables, to work better with update-heavy traffic, we need to *distinguish between row keys and column values (they have different life times)*




      was (Author: yangyangyyy):
    my line of thought comes from update-heavy workload (counters etc). conceivably (putting the row key issue aside for a while)  one region would contain bytebuffer values of similar age. as more updates come in, all the columns in older regions are likely to have all died out, thus allowing us to free the entire region before flushing happens.  


coming back to the row key issue, in the original slab allocator paper ( Jeff Bonwick ) , a slab contains strictly the same objects, which imply that they die at roughly the same time.   if they don't, then yes, in our case, slab has the disadvantage that an entire slab (2MB worth of mem) is held simply because a row key in it is not dead yet.  so to overcome this disadvantage, we probably need to further distinguish between object types to be allocated in the slab: this JIRA (same as HBase code) distinguishes between all the allocations between different memtables, to work better with update-heavy traffic, we need to distinguish between row keys and column values (they have different life times)



  
> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2252:
--------------------------------------

    Attachment: 2252-v4.txt

rebased.  tests pass, but could use a glance at the counters to make sure it's not missing an opportunity to use slab.

otherwise, I think this is ready to go based on the above discussion.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2252) off-heap memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999868#comment-12999868 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

new patchset handles workloads w/ no overwrites as well.

Stu points out that counter's reconcile() operation does allocation too.  this does not handle that yet.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment: merged-2252.tgz

Rebased the merged version for the counter UUID changes.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment: merged-2252.tgz

Attaching a merged version of the two patches, as described earlier.

I also went ahead and slabbed the IndexSummaries, and added naive slabbing to the keycache: we'll need a longer term solution for slabbing caches, because of the overhead of internal fragmentation.

The alternate patch has had a lot of burn in time on our clusters: combined with slabbing the IndexSummaries and disabling the keycache, we saw our first oldgen compaction after 76 hours, with an average of over a week before compaction.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz, merged-2252.tgz
>
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2252) off-heap memtables

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2252:
--------------------------------

    Attachment: 2252-alternate-v2.tgz

Uploading a v2 of the alternate patch... at some point late in development I switched all clone() calls to trim() which was actually avoiding copying the majority of the time.

> off-heap memtables
> ------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-alternate-v2.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092602#comment-13092602 ] 

Jonathan Ellis commented on CASSANDRA-2252:
-------------------------------------------

Happy to look at a patch to fix that.  Please open a new ticket.

> arena allocation for memtables
> ------------------------------
>
>                 Key: CASSANDRA-2252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz
>
>
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira