You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rana Aich <ai...@gmail.com> on 2010/07/26 21:14:26 UTC

Help! Cassandra Data Loader threads are getting stuck

Hi All,

I have to load huge quantity of data into Cassandra (~10Billion rows).

I'm trying to load the Data from files using multithreading.

The idea is each thread will read the TAB delimited file and process chunk
of records.

For example Thread1 reads line 1-1000 lines
Thread 2 reads line 1001-2000 and insert into Cassandra.
Thread 3 reads line 2001-3000 and insert into Cassandra.

Thread 10 reads line 9001-10000 and insert into Cassandra.
Thread 1  reads line 10001-11000 and insert into Cassandra.
Thread 2 reads line 11001-12000 and insert into Cassandra.

and so on...

I'm testing with a small file size with 200000 records.

But somehow the process gets stuck and doesn't proceed any further after
processing say 16,000 records.

I've attached my working file.

Any help will be very much appreciated.

Regards

raich

Re: Help! Cassandra Data Loader threads are getting stuck

Posted by Rana Aich <ai...@gmail.com>.
Thanks for your offer...there was some problem in reading the *.gz files in
System.in.
I've rectified my code..


On Tue, Jul 27, 2010 at 12:09 AM, Thorvaldsson Justus <
justus.thorvaldsson@svenskaspel.se> wrote:

>  I made one program doing just this with Java
>
> Basically
>
> I read with one thread from file into an array stopping when size is 20k
> and w8 until it is less than 20k and continue reading the datafile. (this is
> the raw data I want to move)
>
>
>
> I have n number of threads
>
> Each with one batch of their own and one connection to Cassandra of their
> own.
>
> They fill their batch with data taking it out of the array (this is
> synchronized), when it reaches 1k it sends it to Cassandra.
>
> I had some problems but none regarding Cassandra it was my own code that
> faltered.
>
> I could provide code if you want.
>
> Justus
>
>
>
> *Från:* Aaron Morton [mailto:aaron@thelastpickle.com]
> *Skickat:* den 26 juli 2010 23:32
> *Till:* user@cassandra.apache.org
> *Ämne:* Re: Help! Cassandra Data Loader threads are getting stuck
>
>
>
> Try running it without threading to see if it's a cassandra problem or an
> issue with your threading.
>
> Perhaps split the file and run many single threaded processes to load the
> data.
>
> Aaron
>
>   On 27 Jul, 2010,at 07:14 AM, Rana Aich <ai...@gmail.com> wrote:
>
>  Hi All,
>
>
>
> I have to load huge quantity of data into Cassandra (~10Billion rows).
>
>
>
> I'm trying to load the Data from files using multithreading.
>
>
>
> The idea is each thread will read the TAB delimited file and process chunk
> of records.
>
>
>
> For example Thread1 reads line 1-1000 lines
>
> Thread 2 reads line 1001-2000 and insert into Cassandra.
>
> Thread 3 reads line 2001-3000 and insert into Cassandra.
>
>
>
> Thread 10 reads line 9001-10000 and insert into Cassandra.
>
> Thread 1  reads line 10001-11000 and insert into Cassandra.
>
> Thread 2 reads line 11001-12000 and insert into Cassandra.
>
>
>
> and so on...
>
>
>
> I'm testing with a small file size with 200000 records.
>
>
>
> But somehow the process gets stuck and doesn't proceed any further after
> processing say 16,000 records.
>
>
>
> I've attached my working file.
>
>
>
> Any help will be very much appreciated.
>
>
>
> Regards
>
>
>
> raich
>
>

SV: Help! Cassandra Data Loader threads are getting stuck

Posted by Thorvaldsson Justus <ju...@svenskaspel.se>.
I made one program doing just this with Java
Basically
I read with one thread from file into an array stopping when size is 20k and w8 until it is less than 20k and continue reading the datafile. (this is the raw data I want to move)

I have n number of threads
Each with one batch of their own and one connection to Cassandra of their own.
They fill their batch with data taking it out of the array (this is synchronized), when it reaches 1k it sends it to Cassandra.
I had some problems but none regarding Cassandra it was my own code that faltered.
I could provide code if you want.
Justus

Från: Aaron Morton [mailto:aaron@thelastpickle.com]
Skickat: den 26 juli 2010 23:32
Till: user@cassandra.apache.org
Ämne: Re: Help! Cassandra Data Loader threads are getting stuck

Try running it without threading to see if it's a cassandra problem or an issue with your threading.

Perhaps split the file and run many single threaded processes to load the data.

Aaron

On 27 Jul, 2010,at 07:14 AM, Rana Aich <ai...@gmail.com> wrote:
Hi All,

I have to load huge quantity of data into Cassandra (~10Billion rows).

I'm trying to load the Data from files using multithreading.

The idea is each thread will read the TAB delimited file and process chunk of records.

For example Thread1 reads line 1-1000 lines
Thread 2 reads line 1001-2000 and insert into Cassandra.
Thread 3 reads line 2001-3000 and insert into Cassandra.

Thread 10 reads line 9001-10000 and insert into Cassandra.
Thread 1  reads line 10001-11000 and insert into Cassandra.
Thread 2 reads line 11001-12000 and insert into Cassandra.

and so on...

I'm testing with a small file size with 200000 records.

But somehow the process gets stuck and doesn't proceed any further after processing say 16,000 records.

I've attached my working file.

Any help will be very much appreciated.

Regards

raich

Re: Help! Cassandra Data Loader threads are getting stuck

Posted by Malcolm Smith <ma...@treehousesystems.com>.
Also make sure you have consistency level set to at least ONE

Sent from my iPhone

On Jul 26, 2010, at 5:31 PM, Aaron Morton <aa...@thelastpickle.com> wrote:

> Try running it without threading to see if it's a cassandra problem or an issue with your threading. 
> 
> Perhaps split the file and run many single threaded processes to load the data. 
> 
> Aaron
> 
> 
> On 27 Jul, 2010,at 07:14 AM, Rana Aich <ai...@gmail.com> wrote:
> 
>> Hi All,
>> 
>> I have to load huge quantity of data into Cassandra (~10Billion rows). 
>> 
>> I'm trying to load the Data from files using multithreading.
>> 
>> The idea is each thread will read the TAB delimited file and process chunk of records.
>> 
>> For example Thread1 reads line 1-1000 lines
>> Thread 2 reads line 1001-2000 and insert into Cassandra.
>> Thread 3 reads line 2001-3000 and insert into Cassandra.
>> 
>> Thread 10 reads line 9001-10000 and insert into Cassandra.
>> Thread 1  reads line 10001-11000 and insert into Cassandra.
>> Thread 2 reads line 11001-12000 and insert into Cassandra.
>> 
>> and so on...
>> 
>> I'm testing with a small file size with 200000 records.
>> 
>> But somehow the process gets stuck and doesn't proceed any further after processing say 16,000 records.
>> 
>> I've attached my working file.
>> 
>> Any help will be very much appreciated.
>> 
>> Regards
>> 
>> raich

Re: Help! Cassandra Data Loader threads are getting stuck

Posted by Aaron Morton <aa...@thelastpickle.com>.
Try running it without threading to see if it's a cassandra problem or an issue with your threading.

Perhaps split the file and run many single threaded processes to load the data.

Aaron


On 27 Jul, 2010,at 07:14 AM, Rana Aich <ai...@gmail.com> wrote:

> Hi All,
>
> I have to load huge quantity of data into Cassandra (~10Billion rows). 
>
> I'm trying to load the Data from files using multithreading.
>
> The idea is each thread will read the TAB delimited file and process chunk of records.
>
> For example Thread1 reads line 1-1000 lines
> Thread 2 reads line 1001-2000 and insert into Cassandra.
> Thread 3 reads line 2001-3000 and insert into Cassandra.
>
> Thread 10 reads line 9001-10000 and insert into Cassandra.
> Thread 1  reads line 10001-11000 and insert into Cassandra.
> Thread 2 reads line 11001-12000 and insert into Cassandra.
>
> and so on...
>
> I'm testing with a small file size with 200000 records.
>
> But somehow the process gets stuck and doesn't proceed any further after processing say 16,000 records.
>
> I've attached my working file.
>
> Any help will be very much appreciated.
>
> Regards
>
> raich