You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2011/03/09 18:35:46 UTC
[Cassandra Wiki] Update of "FAQ" by JonathanEllis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by JonathanEllis.
The comment on this change is: add mmap.
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=104&rev2=105

--------------------------------------------------

   * [[#bulkloading|How do I bulk load data into Cassandra?]]
   * [[#range_rp|Why aren't range slices/sequential scans giving me the expected results?]]
   * [[#unsubscribe|How do I unsubscribe from the email list?]]
-  * [[#cleaning_compacted_tables|I compacted, so why do I still have all my SSTables?]]
+  * [[#cleaning_compacted_tables|I compacted, so why did space used not decrease?]]
+  * [[#mmap|Why does top report that Cassandra is using a lot more memory than the Java heap max?]]
  
  <<Anchor(cant_listen_on_ip_any)>>
  
@@ -369, +370 @@

  See BulkLoading
  
  <<Anchor(range_rp)>>
+ 
  == Why aren't range slices/sequential scans giving me the expected results? ==
- 
  You're probably using the RandomPartitioner.  This is the default because it avoids hotspots, but it means your rows are ordered by the md5 of the row key rather than lexicographically by the raw key bytes.
  
+ You '''can''' start out with a start key and end key of [empty] and use the row count argument instead, if your goal is paging the rows.  To get the next page, start from the last key you got in the previous page.
- You '''can''' start out with a start key and end key of [empty] and use the row count argument instead, if
- your goal is paging the rows.  To get the next page, start from the last key you got in the
- previous page.
  
  You can also use intra-row ordering of column names to get ordered results '''within''' a row; with appropriate row 'bucketing,' you often don't need the rows themselves to be ordered.
  
@@ -386, +385 @@

  
  <<Anchor(cleaning_compacted_tables)>>
  
- == I compacted, so why do I still have all my SSTables? ==
+ == I compacted, so why did space used not decrease? ==
  SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary, but Cassandra will force one itself if it detects that it is low on space. A compaction marker is also added to obsolete sstables so they can be deleted on startup if the server does not perform a GC before being restarted. Read more on this subject [[http://wiki.apache.org/cassandra/MemtableSSTable|here]].
  
+ <<Anchor(mmap)>>
+ 
+ == Why does top report that Cassandra is using a lot more memory than the Java heap max? ==
+ Cassandra uses mmap to do zero-copy reads. That is, we use the operating system's virtual memory system to map the sstable data files into the Cassandra process' address space. This will "use" virtual memory; i.e. address space, and will be reported by tools like top accordingly, but on 64 bit systems virtual address space is effectively unlimited so you should not worry about that.
+ 
+ What matters from the perspective of "memory use" in the sense as it is normally meant, is the amount of data allocated on brk() or mmap'd /dev/zero, which represent real memory used.  The key issue is that for a mmap'd file, there is never a need to retain the data resident in physical memory. Thus, whatever you do keep resident in physical memory is essentially just there as a cache, in the same way as normal I/O will cause the kernel page cache to retain data that you read/write.
+ 
+ The difference between normal I/O and mmap() is that in the mmap() case the memory is actually mapped to the process, thus affecting the virtual size as reported by top. The main argument for using mmap() instead of standard I/O is the fact that reading entails just touching memory - in the case of the memory being resident, you just read it - you don't even take a page fault (so no overhead in entering the kernel and doing a semi-context switch). This is covered in more detail [[http://www.varnish-cache.org/trac/wiki/ArchitectNotes|here]].
+