You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2014/09/02 10:36:20 UTC

[jira] [Commented] (CASSANDRA-7860) csv2sstable - bulk load CSV data to SSTables similar to json2sstable

    [ https://issues.apache.org/jira/browse/CASSANDRA-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118032#comment-14118032 ] 

Sylvain Lebresne commented on CASSANDRA-7860:
---------------------------------------------

We agree that cqlsh COPY is too slow and it was recently improved by CASSANDRA-7405. There may be other improvements that can be done for it and we welcome contributions in that regard.

If you really prefer writing sstables directly, there is the CQLSSTableWriter which allows you to easily write your own whatever2sstable tool that fits your requirement. In fact, json2sstable itself has never been make for bulk loading in the first place (CQLSSTableWrite is) and it's somewhat deprecated now (it's not part of the binary distribution in 2.1). 

> csv2sstable - bulk load CSV data to SSTables similar to json2sstable
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-7860
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7860
>             Project: Cassandra
>          Issue Type: New Feature
>         Environment: DataStax Community Edition 2.0.9
>            Reporter: Hari Sekhon
>            Priority: Minor
>
> Need a csv2sstable utility to bulk load billions of rows of CSV data - impractical to have to pre-convert to json before bulk loading to sstable.
> CQL COPY really is too slow - a test of mere 4 million row 6GB CSV directly took 28 minutes... while it only takes 60 secs to cat all that data off the hdfs source filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)