You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by 严超 <ya...@gmail.com> on 2014/12/24 13:34:53 UTC

[Cassandra] [Generation of SStableLoader slow]

Hi, Everyone:

I'm importing a CSV file into Cassandra using SStableLoader. And I'm
following the example here:
https://github.com/yukim/cassandra-bulkload-example/

But, Even though the streaming of SSTables is very fast , I find that
generation of SStables is quite slow for very large files (CSV, 4GB+). I am
using a Dual Core computer with 2 GB ram. Could it be because of the system
spec or any other factor?

Thank you for any advice.

*Best Regards!*


*Chao Yan--------------**My twitter:Andy Yan @yanchao727
<https://twitter.com/yanchao727>*


*My Weibo:http://weibo.com/herewearenow
<http://weibo.com/herewearenow>--------------*

Re: [Cassandra] [Generation of SStableLoader slow]

Posted by Ryan Svihla <rs...@datastax.com>.
I doubt it there are huge gains with tinkering if adding more CPU speeds
the things up, that indicates you're resource bound. It's over a VM, it's
probably a slow underlying disk, there is just physics at some point. You
can try playing with using the java client instead of the sstableloader but
I doubt that will actually be faster for your particular use case.

On Wed, Dec 24, 2014 at 7:05 AM, 严超 <ya...@gmail.com> wrote:

> Yes, I think so too. Plus, I used VM with 4 CPUs and 2 CPUs, and 4CPUs
> really did faster.
> But It took 1 hour to generate sstable for 1G csv. I am wondering if there
> is other way to make it faster except adding CPUs and ram.
>
> *Best Regards!*
>
>
> *Chao Yan--------------**My twitter:Andy Yan @yanchao727
> <https://twitter.com/yanchao727>*
>
>
> *My Weibo:http://weibo.com/herewearenow
> <http://weibo.com/herewearenow>--------------*
>
> 2014-12-24 20:40 GMT+08:00 Ryan Svihla <rs...@datastax.com>:
>
>> I think that'd be slow copying large files with just the cp command.
>> Cassandra isn't doing anything amazingly strange here, you don't have a lot
>> of RAM, nor CPU and I'm assuming the underlying disk is slow here as well.
>> Without more parameters and details it's hard to define if there is an
>> issue.
>>
>> On Dec 24, 2014 7:36 AM, "严超" <ya...@gmail.com> wrote:
>>
>>> Hi, Everyone:
>>>
>>> I'm importing a CSV file into Cassandra using SStableLoader. And I'm
>>> following the example here:
>>> https://github.com/yukim/cassandra-bulkload-example/
>>>
>>> But, Even though the streaming of SSTables is very fast , I find that
>>> generation of SStables is quite slow for very large files (CSV, 4GB+). I am
>>> using a Dual Core computer with 2 GB ram. Could it be because of the system
>>> spec or any other factor?
>>>
>>> Thank you for any advice.
>>>
>>> *Best Regards!*
>>>
>>>
>>> *Chao Yan--------------**My twitter:Andy Yan @yanchao727
>>> <https://twitter.com/yanchao727>*
>>>
>>>
>>> *My Weibo:http://weibo.com/herewearenow
>>> <http://weibo.com/herewearenow>--------------*
>>>
>>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: [Cassandra] [Generation of SStableLoader slow]

Posted by 严超 <ya...@gmail.com>.
Yes, I think so too. Plus, I used VM with 4 CPUs and 2 CPUs, and 4CPUs
really did faster.
But It took 1 hour to generate sstable for 1G csv. I am wondering if there
is other way to make it faster except adding CPUs and ram.

*Best Regards!*


*Chao Yan--------------**My twitter:Andy Yan @yanchao727
<https://twitter.com/yanchao727>*


*My Weibo:http://weibo.com/herewearenow
<http://weibo.com/herewearenow>--------------*

2014-12-24 20:40 GMT+08:00 Ryan Svihla <rs...@datastax.com>:

> I think that'd be slow copying large files with just the cp command.
> Cassandra isn't doing anything amazingly strange here, you don't have a lot
> of RAM, nor CPU and I'm assuming the underlying disk is slow here as well.
> Without more parameters and details it's hard to define if there is an
> issue.
>
> On Dec 24, 2014 7:36 AM, "严超" <ya...@gmail.com> wrote:
>
>> Hi, Everyone:
>>
>> I'm importing a CSV file into Cassandra using SStableLoader. And I'm
>> following the example here:
>> https://github.com/yukim/cassandra-bulkload-example/
>>
>> But, Even though the streaming of SSTables is very fast , I find that
>> generation of SStables is quite slow for very large files (CSV, 4GB+). I am
>> using a Dual Core computer with 2 GB ram. Could it be because of the system
>> spec or any other factor?
>>
>> Thank you for any advice.
>>
>> *Best Regards!*
>>
>>
>> *Chao Yan--------------**My twitter:Andy Yan @yanchao727
>> <https://twitter.com/yanchao727>*
>>
>>
>> *My Weibo:http://weibo.com/herewearenow
>> <http://weibo.com/herewearenow>--------------*
>>
>

Re: [Cassandra] [Generation of SStableLoader slow]

Posted by Ryan Svihla <rs...@datastax.com>.
I think that'd be slow copying large files with just the cp command.
Cassandra isn't doing anything amazingly strange here, you don't have a lot
of RAM, nor CPU and I'm assuming the underlying disk is slow here as well.
Without more parameters and details it's hard to define if there is an
issue.

On Dec 24, 2014 7:36 AM, "严超" <ya...@gmail.com> wrote:

> Hi, Everyone:
>
> I'm importing a CSV file into Cassandra using SStableLoader. And I'm
> following the example here:
> https://github.com/yukim/cassandra-bulkload-example/
>
> But, Even though the streaming of SSTables is very fast , I find that
> generation of SStables is quite slow for very large files (CSV, 4GB+). I am
> using a Dual Core computer with 2 GB ram. Could it be because of the system
> spec or any other factor?
>
> Thank you for any advice.
>
> *Best Regards!*
>
>
> *Chao Yan--------------**My twitter:Andy Yan @yanchao727
> <https://twitter.com/yanchao727>*
>
>
> *My Weibo:http://weibo.com/herewearenow
> <http://weibo.com/herewearenow>--------------*
>