You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fasika Daksa <ca...@gmail.com> on 2014/04/07 14:52:32 UTC

Inserting with large number of column

We are running different workload test on Cassandra and Redis for
benchmarking. We wrote a java client to read, write and evaluate the
elapsed time of different test cases. Cassandra was doing great until we
introduced 20'000 number of cols...... the insertion is running for a day
and then i stopped it.

First I create the table, index all the columns then insert the data. I
looked in to the process and the part it is taking too long is the indexing
part. We need to index all the columns because we use all or part of the
columns depending on the query generator.


Can you see a potential solution for my case? Is there any way to optimize
the indexing?....or generally the insertion? I also tried indexing after
insertion but it is all the same.


we are running this experiment on a single machine with 196GB of ram ...
1.6 TB of disk space and 8core CPU...

cqlsh 4.1.0 | Cassandra 2.0.3

Re: Inserting with large number of column

Posted by Fasika Daksa <ca...@gmail.com>.
Thanks for your response currently we are inserting the data line by line
and soon we will implement the bulk insertion. the meta used to generate
the data is No of Boolean cols: 20,000 .....No of Int cols: 0 ...No of Rows
= 100,000(we use only bool or integer variables). Attached you can find the
client code i am using to run this benchmark test

I also tried creating the index after import but it is the same




On Mon, Apr 7, 2014 at 3:05 PM, Tupshin Harper <tu...@tupshin.com> wrote:

> More details would be helpful (exact schema),  method of inserting data,
> etc)  but you can try just doing dropping the indices and recreate them
> after the import is finished.
>
> -Tupshin
> On Apr 7, 2014 8:53 AM, "Fasika Daksa" <ca...@gmail.com> wrote:
>
>> We are running different workload test on Cassandra and Redis for
>> benchmarking. We wrote a java client to read, write and evaluate the
>> elapsed time of different test cases. Cassandra was doing great until we
>> introduced 20'000 number of cols...... the insertion is running for a day
>> and then i stopped it.
>>
>> First I create the table, index all the columns then insert the data. I
>> looked in to the process and the part it is taking too long is the indexing
>> part. We need to index all the columns because we use all or part of the
>> columns depending on the query generator.
>>
>>
>> Can you see a potential solution for my case? Is there any way to
>> optimize the indexing?....or generally the insertion? I also tried indexing
>> after insertion but it is all the same.
>>
>>
>> we are running this experiment on a single machine with 196GB of ram ...
>> 1.6 TB of disk space and 8core CPU...
>>
>> cqlsh 4.1.0 | Cassandra 2.0.3
>>
>

Re: Inserting with large number of column

Posted by Tupshin Harper <tu...@tupshin.com>.
More details would be helpful (exact schema),  method of inserting data,
etc)  but you can try just doing dropping the indices and recreate them
after the import is finished.

-Tupshin
On Apr 7, 2014 8:53 AM, "Fasika Daksa" <ca...@gmail.com> wrote:

> We are running different workload test on Cassandra and Redis for
> benchmarking. We wrote a java client to read, write and evaluate the
> elapsed time of different test cases. Cassandra was doing great until we
> introduced 20'000 number of cols...... the insertion is running for a day
> and then i stopped it.
>
> First I create the table, index all the columns then insert the data. I
> looked in to the process and the part it is taking too long is the indexing
> part. We need to index all the columns because we use all or part of the
> columns depending on the query generator.
>
>
> Can you see a potential solution for my case? Is there any way to optimize
> the indexing?....or generally the insertion? I also tried indexing after
> insertion but it is all the same.
>
>
> we are running this experiment on a single machine with 196GB of ram ...
> 1.6 TB of disk space and 8core CPU...
>
> cqlsh 4.1.0 | Cassandra 2.0.3
>