You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Suresh Babu Mallampati <sm...@gmail.com> on 2017/10/25 16:51:44 UTC

code snippet for cqlsh COPY from

Hi All,

Can someone provide me the code snippet for the cqlsh COPY from csv file.

I just want to know how that COPY mechanism work compared to normal
insert/commit to avaoid the batch size exceed the limit.

Thanks,
Suresh.

Re: code snippet for cqlsh COPY from

Posted by Andy Tolbert <an...@datastax.com>.

Hi Suresh,

cqlsh COPY does batches intelligently by only grouping inserts targeting
the same partition in a batch.

As of version 3.6, C* will not emit the "batch size exceeded" errors if all
statements in a batch belong to the same partition (CASSANDRA-13467
<https://issues.apache.org/jira/browse/CASSANDRA-10876>).

The docs (https://cassandra.apache.org/doc/latest/tools/cqlsh.html#copy-from)
are a good reference for how to use copy from.

https://www.datastax.com/dev/blog/new-features-in-cqlsh-copy is also a good
reference.

Here's an example from something I was working from locally:

cqlsh -e "COPY andy.table100b (pkey,skey,text1,text2,text3,text4,text5)
from 'csv/ordered/100b/*.csv' WITH header = true AND INGESTRATE=1000000 AND
NUMPROCESSES=32 AND MAXBATCHSIZE=100;" myhostname

Note you should probably still keep your batches relatively small even with
single partition batches depending on your dataset.  In my particular case
I was working with relatively small data (100-byte rows).  There is
diminishing returns in terms of throughput as your increase your batch
size, but that will vary based on your data and environment.

Thanks,
Andy

On Wed, Oct 25, 2017 at 11:51 AM Suresh Babu Mallampati <
smallampati17@gmail.com> wrote:

> Hi All,
>
> Can someone provide me the code snippet for the cqlsh COPY from csv file.
>
> I just want to know how that COPY mechanism work compared to normal
> insert/commit to avaoid the batch size exceed the limit.
>
> Thanks,
> Suresh.
>