You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Dmitry (JIRA)" <ji...@apache.org> on 2015/08/14 17:25:46 UTC
[jira] [Commented] (CASSANDRA-9552) COPY FROM times out after 110000 inserts

    [ https://issues.apache.org/jira/browse/CASSANDRA-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697221#comment-14697221 ] 

Dmitry commented on CASSANDRA-9552:
-----------------------------------

Getting nearly same error.
But my problem is in - big amount of "key:value" items in map collection.
When I try to insert 1kk rows with more than 20 pairs of "key:value" into one map collection - it fails with:
{code:sql}
  COPY data("Col1", "Col2", "Col3", "Col4", "Col5") FROM '/tmp/dumpfile1kk20.tsv' WITH DELIMITER='\t';
Processed 170000 rows; Write: 4008.84 rows/s
Connection heartbeat failure
Aborting import at record #177876. Previously inserted records are still present, and some records after that may be present as well.
{code}
If I try to insert nearly 80 pairs of "key:value" into one map collection - it fails earlier - near 90k row

> COPY FROM times out after 110000 inserts
> ----------------------------------------
>
>                 Key: CASSANDRA-9552
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9552
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter:  Brian Hess
>              Labels: cqlsh
>             Fix For: 2.1.x
>
>
> I am trying to test out performance of COPY FROM on various schemas.  I have a 100-BIGINT-column table defined as:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
> CREATE TABLE test.test100 (
>     pkey bigint,    ccol bigint,    col0 bigint,    col1 bigint,    col10 bigint,
>     col11 bigint,    col12 bigint,    col13 bigint,    col14 bigint,    col15 bigint,
>     col16 bigint,    col17 bigint,    col18 bigint,    col19 bigint,    col2 bigint,
>     col20 bigint,    col21 bigint,    col22 bigint,    col23 bigint,    col24 bigint,
>     col25 bigint,    col26 bigint,    col27 bigint,    col28 bigint,    col29 bigint,
>     col3 bigint,    col30 bigint,    col31 bigint,    col32 bigint,    col33 bigint,
>     col34 bigint,    col35 bigint,    col36 bigint,    col37 bigint,    col38 bigint,
>     col39 bigint,    col4 bigint,    col40 bigint,    col41 bigint,    col42 bigint,
>     col43 bigint,    col44 bigint,    col45 bigint,    col46 bigint,    col47 bigint,
>     col48 bigint,    col49 bigint,    col5 bigint,    col50 bigint,    col51 bigint,
>     col52 bigint,    col53 bigint,    col54 bigint,    col55 bigint,    col56 bigint,
>     col57 bigint,    col58 bigint,    col59 bigint,    col6 bigint,    col60 bigint,
>     col61 bigint,    col62 bigint,    col63 bigint,    col64 bigint,    col65 bigint,
>     col66 bigint,    col67 bigint,    col68 bigint,    col69 bigint,    col7 bigint,
>     col70 bigint,    col71 bigint,    col72 bigint,    col73 bigint,    col74 bigint,
>     col75 bigint,    col76 bigint,    col77 bigint,    col78 bigint,    col79 bigint,
>     col8 bigint,    col80 bigint,    col81 bigint,    col82 bigint,    col83 bigint,
>     col84 bigint,    col85 bigint,    col86 bigint,    col87 bigint,    col88 bigint,
>     col89 bigint,    col9 bigint,    col90 bigint,    col91 bigint,    col92 bigint,
>     col93 bigint,    col94 bigint,    col95 bigint,    col96 bigint,    col97 bigint,
>     PRIMARY KEY (pkey, ccol)
> ) WITH CLUSTERING ORDER BY (ccol ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
> {code}
> I then try to load the linked file of 120,000 rows of 100 BIGINT columns via:
> {code}
> cqlsh -e "COPY test.test100(pkey,ccol,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33,col34,col35,col36,col37,col38,col39,col40,col41,col42,col43,col44,col45,col46,col47,col48,col49,col50,col51,col52,col53,col54,col55,col56,col57,col58,col59,col60,col61,col62,col63,col64,col65,col66,col67,col68,col69,col70,col71,col72,col73,col74,col75,col76,col77,col78,col79,col80,col81,col82,col83,col84,col85,col86,col87,col88,col89,col90,col91,col92,col93,col94,col95,col96,col97) FROM 'data120K.csv'"
> {code}
> Data file here: https://drive.google.com/file/d/0B87-Pevy14fuUVcxemFRcFFtRjQ/view?usp=sharing
> After 110000 rows, it errors and hangs:
> {code}
> <stdin>:1:110000 rows; Write: 19848.21 rows/s
> Connection heartbeat failure
> <stdin>:1:Aborting import at record #1196. Previously inserted records are still present, and some records after that may be present as well.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)