You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefania (JIRA)" <ji...@apache.org> on 2016/01/21 02:56:39 UTC
[jira] [Created] (CASSANDRA-11053) COPY FROM on large datasets: fix
progress report and debug performance
Stefania created CASSANDRA-11053:
------------------------------------
Summary: COPY FROM on large datasets: fix progress report and debug performance
Key: CASSANDRA-11053
URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
Project: Cassandra
Issue Type: Bug
Components: Tools
Reporter: Stefania
Assignee: Stefania
Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
Attachments: copy_from_large_benchmark.txt
Running COPY from on a large dataset (20G divided in 20M records) revealed two issues:
* The progress report is incorrect, it is very slow until almost the end of the test at which point it catches up extremely quickly.
* The performance in rows per second is similar to running smaller tests with a smaller cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages 50,000 rows per second under the same set-up, therefore resulting 1.5 times faster.
See attached file _copy_from_large_benchmark.txt_ for the benchmark details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)