You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Lucas Di Pentima <lu...@di-pentima.com.ar> on 2010/04/17 02:13:27 UTC

Just starting to play with Cassandra: (Surely) Dumb Question

Hello all,

I'm playing with Cassandra 0.6.0-rc1 on a MacOSX, with the 'cassandra' ruby gem.

I load some test data to it and I was trying the gem's get() API when I realized that if I call it some way like this:

db.get('SomeSCFName', 'SomeKey')

It returned me only 100 subcolumns when 'SomeKey' has approx 150000 subcolumns. Next I tried calling get() like this:

db.get('SomeSFCName', 'SomeKey', :count => N)

My problem is that when N is a number higher than 50000 (approximately), I get the following error:

Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160

The same happens if I call:

db.count_columns('SomeSCFName', 'SomeKey')

...on the same 'SomeKey', but if I call count_columns() with some other key that holds less columns, it works without problems.

My setup is:

* Cassandra 0.6.0-rc1 downloaded from the website, with all default configurations
* Ruby 1.8.7
* Cassandra gem 0.8.1
* MacOSX 1.6.3

Any help will be appreciated!

--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: lucas@di-pentima.com.ar
MSN: ldipenti75@hotmail.com





Re: Just starting to play with Cassandra: (Surely) Dumb Question

Posted by Sylvain Lebresne <sy...@yakaz.com>.
On Sat, Apr 17, 2010 at 5:38 PM, Lucas Di Pentima
<lu...@di-pentima.com.ar> wrote:
> Hello Sylvain,
>
> El 17/04/2010, a las 12:09, Sylvain Lebresne escribió:
>
>> On Sat, Apr 17, 2010 at 4:52 PM, Lucas Di Pentima
>> <lu...@di-pentima.com.ar> wrote:
>>> Hello Jonathan,
>>>
>>> I supposed the same, that's why I tried the count_columns() call, but when I try it with some big SCF, I get the same error message:
>>>
>>> Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160
>>>
>>> Should I use count_columns() or is there any other way to know how much columns exists?
>>
>> get_count() (that, even though I don't know the ruby gem, is most
>> probably used by
>> count_columns() under the hood) actually query the whole row and
>> simply return the number
>> of column founded. Hence the only thing you gain by counting columns
>> instead of requesting
>> them is that you don't have to pull all the columns over the network.
>> Hence counting is (roughly) as costly as requesting the whole row and
>> as such, it is no wonder
>> it timeout in your case.
>>
>
>
> Thanks for the explanation, so in the case I need to fetch all the columns on a big ColumnFamily, I should request a few thousands at a time as Jonathan told me, using the start parameter, until I get no more columns, am I right?

Yes

>
>
> Best regards,
> --
> Lucas Di Pentima - Santa Fe, Argentina
> Jabber: lucas@di-pentima.com.ar
> MSN: ldipenti75@hotmail.com
>
>
>
>
>

Re: Just starting to play with Cassandra: (Surely) Dumb Question

Posted by Lucas Di Pentima <lu...@di-pentima.com.ar>.
Hello Sylvain,

El 17/04/2010, a las 12:09, Sylvain Lebresne escribió:

> On Sat, Apr 17, 2010 at 4:52 PM, Lucas Di Pentima
> <lu...@di-pentima.com.ar> wrote:
>> Hello Jonathan,
>> 
>> I supposed the same, that's why I tried the count_columns() call, but when I try it with some big SCF, I get the same error message:
>> 
>> Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160
>> 
>> Should I use count_columns() or is there any other way to know how much columns exists?
> 
> get_count() (that, even though I don't know the ruby gem, is most
> probably used by
> count_columns() under the hood) actually query the whole row and
> simply return the number
> of column founded. Hence the only thing you gain by counting columns
> instead of requesting
> them is that you don't have to pull all the columns over the network.
> Hence counting is (roughly) as costly as requesting the whole row and
> as such, it is no wonder
> it timeout in your case.
> 


Thanks for the explanation, so in the case I need to fetch all the columns on a big ColumnFamily, I should request a few thousands at a time as Jonathan told me, using the start parameter, until I get no more columns, am I right?


Best regards,
--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: lucas@di-pentima.com.ar
MSN: ldipenti75@hotmail.com





Re: Just starting to play with Cassandra: (Surely) Dumb Question

Posted by Sylvain Lebresne <sy...@yakaz.com>.
On Sat, Apr 17, 2010 at 4:52 PM, Lucas Di Pentima
<lu...@di-pentima.com.ar> wrote:
> Hello Jonathan,
>
> I supposed the same, that's why I tried the count_columns() call, but when I try it with some big SCF, I get the same error message:
>
> Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160
>
> Should I use count_columns() or is there any other way to know how much columns exists?

get_count() (that, even though I don't know the ruby gem, is most
probably used by
count_columns() under the hood) actually query the whole row and
simply return the number
of column founded. Hence the only thing you gain by counting columns
instead of requesting
them is that you don't have to pull all the columns over the network.
Hence counting is (roughly) as costly as requesting the whole row and
as such, it is no wonder
it timeout in your case.

When https://issues.apache.org/jira/browse/CASSANDRA-744 will be
included, you'll be able
to count the columns chunk by chunk (but it will still be as costly as
reading the row chunk by
chunk excepted for the network transfer of all those columns).

--
Sylvain

>
> Best regards
>
> El 17/04/2010, a las 01:14, Jonathan Ellis escribió:
>
>> You're supposed to request a few hundred or thousand columns per call,
>> then if you need more request the next set using the start parameter.
>>
>> On Fri, Apr 16, 2010 at 7:13 PM, Lucas Di Pentima
>> <lu...@di-pentima.com.ar> wrote:
>>> Hello all,
>>>
>>> I'm playing with Cassandra 0.6.0-rc1 on a MacOSX, with the 'cassandra' ruby gem.
>>>
>>> I load some test data to it and I was trying the gem's get() API when I realized that if I call it some way like this:
>>>
>>> db.get('SomeSCFName', 'SomeKey')
>>>
>>> It returned me only 100 subcolumns when 'SomeKey' has approx 150000 subcolumns. Next I tried calling get() like this:
>>>
>>> db.get('SomeSFCName', 'SomeKey', :count => N)
>>>
>>> My problem is that when N is a number higher than 50000 (approximately), I get the following error:
>>>
>>> Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160
>>>
>>> The same happens if I call:
>>>
>>> db.count_columns('SomeSCFName', 'SomeKey')
>>>
>>> ...on the same 'SomeKey', but if I call count_columns() with some other key that holds less columns, it works without problems.
>>>
>>> My setup is:
>>>
>>> * Cassandra 0.6.0-rc1 downloaded from the website, with all default configurations
>>> * Ruby 1.8.7
>>> * Cassandra gem 0.8.1
>>> * MacOSX 1.6.3
>>>
>>> Any help will be appreciated!
>>>
>>> --
>>> Lucas Di Pentima - Santa Fe, Argentina
>>> Jabber: lucas@di-pentima.com.ar
>>> MSN: ldipenti75@hotmail.com
>>>
>>>
>>>
>>>
>>>
>
> --
> Lucas Di Pentima - Santa Fe, Argentina
> Jabber: lucas@di-pentima.com.ar
> MSN: ldipenti75@hotmail.com
>
>
>
>
>

Re: Just starting to play with Cassandra: (Surely) Dumb Question

Posted by Lucas Di Pentima <lu...@di-pentima.com.ar>.
Hello Jonathan,

I supposed the same, that's why I tried the count_columns() call, but when I try it with some big SCF, I get the same error message:

Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160

Should I use count_columns() or is there any other way to know how much columns exists?

Best regards

El 17/04/2010, a las 01:14, Jonathan Ellis escribió:

> You're supposed to request a few hundred or thousand columns per call,
> then if you need more request the next set using the start parameter.
> 
> On Fri, Apr 16, 2010 at 7:13 PM, Lucas Di Pentima
> <lu...@di-pentima.com.ar> wrote:
>> Hello all,
>> 
>> I'm playing with Cassandra 0.6.0-rc1 on a MacOSX, with the 'cassandra' ruby gem.
>> 
>> I load some test data to it and I was trying the gem's get() API when I realized that if I call it some way like this:
>> 
>> db.get('SomeSCFName', 'SomeKey')
>> 
>> It returned me only 100 subcolumns when 'SomeKey' has approx 150000 subcolumns. Next I tried calling get() like this:
>> 
>> db.get('SomeSFCName', 'SomeKey', :count => N)
>> 
>> My problem is that when N is a number higher than 50000 (approximately), I get the following error:
>> 
>> Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160
>> 
>> The same happens if I call:
>> 
>> db.count_columns('SomeSCFName', 'SomeKey')
>> 
>> ...on the same 'SomeKey', but if I call count_columns() with some other key that holds less columns, it works without problems.
>> 
>> My setup is:
>> 
>> * Cassandra 0.6.0-rc1 downloaded from the website, with all default configurations
>> * Ruby 1.8.7
>> * Cassandra gem 0.8.1
>> * MacOSX 1.6.3
>> 
>> Any help will be appreciated!
>> 
>> --
>> Lucas Di Pentima - Santa Fe, Argentina
>> Jabber: lucas@di-pentima.com.ar
>> MSN: ldipenti75@hotmail.com
>> 
>> 
>> 
>> 
>> 

--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: lucas@di-pentima.com.ar
MSN: ldipenti75@hotmail.com





Re: Just starting to play with Cassandra: (Surely) Dumb Question

Posted by Jonathan Ellis <jb...@gmail.com>.
You're supposed to request a few hundred or thousand columns per call,
then if you need more request the next set using the start parameter.

On Fri, Apr 16, 2010 at 7:13 PM, Lucas Di Pentima
<lu...@di-pentima.com.ar> wrote:
> Hello all,
>
> I'm playing with Cassandra 0.6.0-rc1 on a MacOSX, with the 'cassandra' ruby gem.
>
> I load some test data to it and I was trying the gem's get() API when I realized that if I call it some way like this:
>
> db.get('SomeSCFName', 'SomeKey')
>
> It returned me only 100 subcolumns when 'SomeKey' has approx 150000 subcolumns. Next I tried calling get() like this:
>
> db.get('SomeSFCName', 'SomeKey', :count => N)
>
> My problem is that when N is a number higher than 50000 (approximately), I get the following error:
>
> Thrift::TransportException: Socket: Timed out reading 4096 bytes from 127.0.0.1:9160
>
> The same happens if I call:
>
> db.count_columns('SomeSCFName', 'SomeKey')
>
> ...on the same 'SomeKey', but if I call count_columns() with some other key that holds less columns, it works without problems.
>
> My setup is:
>
> * Cassandra 0.6.0-rc1 downloaded from the website, with all default configurations
> * Ruby 1.8.7
> * Cassandra gem 0.8.1
> * MacOSX 1.6.3
>
> Any help will be appreciated!
>
> --
> Lucas Di Pentima - Santa Fe, Argentina
> Jabber: lucas@di-pentima.com.ar
> MSN: ldipenti75@hotmail.com
>
>
>
>
>