You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/07/18 14:26:57 UTC

[jira] [Updated] (CASSANDRA-2911) Simplified classes to write SSTables (for bulk loading usage)

     [ https://issues.apache.org/jira/browse/CASSANDRA-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2911:
----------------------------------------

    Attachment: 0001-Add-new-simple-sstable-writer.patch

Attached patch actually add 2 new classes (3 counting the abstract one): SSTableSimpleWriter and SSTableSimpleUnsortedWriter.

Both are facade to an SSTableWriter, so that writing an sstable looks roughly like this:
{noformat}
  SSTableSimpleUnsortedWriter writer = ...;
  long time = System.currentTimeMillis();
  writer.newRow("row1");
  writer.addColumn("c1", "v1", time);
  writer.addColumn("c2", "v2", time);
  writer.newRow("row2");
  writer.addColumn("c3", "v3", time);
  ...
  writer.close();
{noformat}
(the "roughly" being due to the fact that all methods expect a ByteBuffer so you'll need to convert all those string in real life).

Those class also make it easy to add expiring and counter columns. There is not support to add tombstones, but why would one insert tombstone anyway (those are meant for use to write sstable from external data).

SSTableSimpleWriter expects rows to be added in sorted order, which is probably only useful for data that have been exported from Cassandra in the first place. SSTableSimpleUnsortedWriter in contrast does not expect any sorting beforehand. It buffers some amount of rows in memory and flush them together, creating one sstable each time (it's a micro storage engine with memtable + flush in a way).


> Simplified classes to write SSTables (for bulk loading usage)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-2911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2911
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: bulkloader
>         Attachments: 0001-Add-new-simple-sstable-writer.patch
>
>
> sstableloader only stream existing sstables. If you need to load data existing in another form (json, csv, whatnot), you need to first write the sstable(s) to load. The recommended way to do this is either to use json2sstable or to modify it if your input is not json. Modifying json2sstable is however more involved than it needs to be, you'll need at least some basic understanding of a bunch of internal classes (DecoratedKey, ColumnFamily, SuperColumn, ...). Even for json input, you can use json2sstable only if your json actually conform to what is expected and even then, good luck to someone that want to add counters.
> This ticket proposes to add a simple interface to write sstables. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira