You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "T Jake Luciani (JIRA)" <ji...@apache.org> on 2011/01/24 19:06:46 UTC

[jira] Commented: (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

    [ https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985854#action_12985854 ] 

T Jake Luciani commented on CASSANDRA-1278:
-------------------------------------------

I haven't had much time to dig into this yet but here is what I observed so far:

1. took 12 + minutes to bulk load 1.8G of data locally.  I have no bearing if that is fast or slow but it felt slow, what should I expect?
2. compaction ran all during the bulk load. 
3. listcptdata needs a usage message
4. needs a readme file explaining this, since it's not obvious what's going on here


> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7.1
>
>         Attachments: 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40.67h
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either directed to just do it responsibly with thrift or a higher level client, or they have to explore the contrib/bmt example - http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires delving into the code to find out how it works and then applying it to the given problem.  Using either method, the user also needs to keep in mind that overloading the cluster is possible - which will hopefully be addressed in CASSANDRA-685
> This improvement would be to create a contrib module or set of documents dealing with bulk loading.  Perhaps it could include code in the Core to make it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to do - bulk load their data into Cassandra.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.