You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Akshay Ballarpure <ak...@tcs.com> on 2014/07/23 11:00:09 UTC
CSV Import is taking huge time
Hello,
I am trying copy command in Cassandra to import CSV file in to DB, Import
is taking huge time, any suggestion to improve it?
id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
----
--
--
there are ~ 50 K lines in this file , size is ~ 5 MB.
I have created table as per below:
create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e
int, f int,
g int, h int,i int, j int, k int, l int,m int, n
int, o int, p int, q int, r int, s int,
t int, u int, v int, w int, x int, y int , z int);
Copy Command:
COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n
, o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH
HEADER=TRUE;
Issue here is it's taking huge time to import
cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j
, k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM
'csldata1.csv' WITH HEADER=TRUE;
66215 rows imported in 1 minute and 31.044 seconds.
Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: CSV Import is taking huge time
Posted by Akshay Ballarpure <ak...@tcs.com>.
Tyler, Thanks for reply.
I didn't understood you fully. can you please elaborate ?
Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
From:
Tyler Hobbs <ty...@datastax.com>
To:
user@cassandra.apache.org
Date:
07/24/2014 02:07 AM
Subject:
Re: CSV Import is taking huge time
See https://issues.apache.org/jira/browse/CASSANDRA-7405.
Currently cqlsh's COPY FROM just uses a single-threaded for-loop with
synchronous inserts.
On Wed, Jul 23, 2014 at 8:09 AM, Jack Krupansky <ja...@basetechnology.com>
wrote:
Is it compute bound or I/O bound?
What does your cluster look like?
-- Jack Krupansky
From: Akshay Ballarpure
Sent: Wednesday, July 23, 2014 5:00 AM
To: user@cassandra.apache.org
Subject: CSV Import is taking huge time
Hello,
I am trying copy command in Cassandra to import CSV file in to DB, Import
is taking huge time, any suggestion to improve it?
id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
----
--
--
there are ~ 50 K lines in this file , size is ~ 5 MB.
I have created table as per below:
create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e
int, f int,
g int, h int,i int, j int, k int, l int,m int, n
int, o int, p int, q int, r int, s
int, t int, u int, v int, w int, x int, y int , z int);
Copy Command:
COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n
, o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH
HEADER=TRUE;
Issue here is it's taking huge time to import
cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j
, k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM
'csldata1.csv' WITH HEADER=TRUE;
66215 rows imported in 1 minute and 31.044 seconds.
Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
--
Tyler Hobbs
DataStax
Re: CSV Import is taking huge time
Posted by Tyler Hobbs <ty...@datastax.com>.
See https://issues.apache.org/jira/browse/CASSANDRA-7405.
Currently cqlsh's COPY FROM just uses a single-threaded for-loop with
synchronous inserts.
On Wed, Jul 23, 2014 at 8:09 AM, Jack Krupansky <ja...@basetechnology.com>
wrote:
> Is it compute bound or I/O bound?
>
> What does your cluster look like?
>
> -- Jack Krupansky
>
> *From:* Akshay Ballarpure <ak...@tcs.com>
> *Sent:* Wednesday, July 23, 2014 5:00 AM
> *To:* user@cassandra.apache.org
> *Subject:* CSV Import is taking huge time
>
> Hello,
> I am trying copy command in Cassandra to import CSV file in to DB, Import
> is taking huge time, any suggestion to improve it?
>
> id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
> 100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
> 101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
> ----
> --
> --
>
> there are ~ 50 K lines in this file , size is ~ 5 MB.
>
> I have created table as per below:
>
> create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e
> int, f int,
> g int, h int,i int, j int, k int, l int,m int, n
> int, o int, p int, q int, r int, s
> int, t int, u int, v int, w int, x int, y int , z int);
> Copy Command:
>
> COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n
> , o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH
> HEADER=TRUE;
>
> Issue here is it's taking huge time to import
>
> cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j
> , k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM
> 'csldata1.csv' WITH HEADER=TRUE;
> 66215 rows imported in *1 minute and 31.044 seconds*.
>
>
> Thanks & Regards
> Akshay Ghanshyam Ballarpure
> Tata Consultancy Services
> Cell:- 9985084075
> Mailto: akshay.ballarpure@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty. IT Services
> Business Solutions
> Consulting
> ____________________________________________
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
--
Tyler Hobbs
DataStax <http://datastax.com/>
Re: CSV Import is taking huge time
Posted by Akshay Ballarpure <ak...@tcs.com>.
Thanks Jack for quick reply. i didn't understood your question completely.
i am very new to Cassandra. I just installed single node cluster
[root@CSL-simulation bin]# ./nodetool -host 10.59.18.206 -p 7199 status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 55.65 MB 256 100.0%
1159cda0-6a8c-423d-9a20-cdedd4db9907 rack1
Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
From:
"Jack Krupansky" <ja...@basetechnology.com>
To:
<us...@cassandra.apache.org>
Date:
07/23/2014 06:39 PM
Subject:
Re: CSV Import is taking huge time
Is it compute bound or I/O bound?
What does your cluster look like?
-- Jack Krupansky
From: Akshay Ballarpure
Sent: Wednesday, July 23, 2014 5:00 AM
To: user@cassandra.apache.org
Subject: CSV Import is taking huge time
Hello,
I am trying copy command in Cassandra to import CSV file in to DB, Import
is taking huge time, any suggestion to improve it?
id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
----
--
--
there are ~ 50 K lines in this file , size is ~ 5 MB.
I have created table as per below:
create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e
int, f int,
g int, h int,i int, j int, k int, l int,m int, n
int, o int, p int, q int, r int, s
int, t int, u int, v int, w int, x int, y int , z int);
Copy Command:
COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n
, o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH
HEADER=TRUE;
Issue here is it's taking huge time to import
cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j
, k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM
'csldata1.csv' WITH HEADER=TRUE;
66215 rows imported in 1 minute and 31.044 seconds.
Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: CSV Import is taking huge time
Posted by Jack Krupansky <ja...@basetechnology.com>.
Is it compute bound or I/O bound?
What does your cluster look like?
-- Jack Krupansky
From: Akshay Ballarpure
Sent: Wednesday, July 23, 2014 5:00 AM
To: user@cassandra.apache.org
Subject: CSV Import is taking huge time
Hello,
I am trying copy command in Cassandra to import CSV file in to DB, Import is taking huge time, any suggestion to improve it?
id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
----
--
--
there are ~ 50 K lines in this file , size is ~ 5 MB.
I have created table as per below:
create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e int, f int,
g int, h int,i int, j int, k int, l int,m int, n int, o int, p int, q int, r int, s int, t int, u int, v int, w int, x int, y int , z int);
Copy Command:
COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH HEADER=TRUE;
Issue here is it's taking huge time to import
cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH HEADER=TRUE;
66215 rows imported in 1 minute and 31.044 seconds.
Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you