You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Tharindu Mathew <mc...@gmail.com> on 2011/09/18 21:16:41 UTC

Possibility of going OOM using get_count

Hi everyone,

I noticed this line in the API docs,

The method is not O(1). It takes all the columns from disk to calculate the
answer. The only benefit of the method is that you do not need to pull all
the columns over Thrift interface to count them.
Does this mean if a row has a large number of columns calling this method
might make it go OOM?

Thanks in advance.

-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Possibility of going OOM using get_count

Posted by Boris Yen <yu...@gmail.com>.

Hi Aaron,

Thanks for the explanation, I know the performance will be varied when the
offset is a very large number, just like what has been mentioned
on CASSANDRA-261. Even if the users implement the offset on the client side,
they suffer the same issues, I just think it would be nice if cassandra can
provide this function internally, of course this function will have its
limitation, just like any other functions cassnadra has, Counter, for
example.

In CASSANDRA-261, it seems cassandra had the offset function, however, due
to the some RR issues it has been removed, CASSANDRA-286. I think the reason
why CASSANDRA-261 has the RR issue is because it changes the internal
mechanism in order to provide the offset function. Unlike
CASSANDRA-2894<https://issues.apache.org/jira/browse/CASSANDRA-2894>,
it only changes code in the "CassandraServer", it should not have the same
issue as CASSANDRA-261. Therefore, I was wondering if you could re-consider
to put the offset function back to cassandra. It should be really helpful
for many users.

Regards
Boris

On Sun, Sep 25, 2011 at 12:21 PM, aaron morton <aa...@thelastpickle.com>wrote:

> The changes in get_count() are designed to stop counts for very large rows
> running out of memory as they try to hold millions of columns in memory.
>
> So if you ask to count all the cols in a row with 1M cols, it will (by
> default) read the first 1024 columns, and then the next 1024 using the last
> column read as the first column for the next page.
>
> The important part is that it is actually reading the columns. Tombstones
> mean we do not know if a column should be a member of the result set for a
> query until it is read and reconciled with all the other versions of a
> column. e.g. 3 sstables have each have a value for a column, if one is a
> tombstone then the column may or may not be deleted. We do not know until
> all 3 column versions are reconciled.
>
> get_count() is like get_slice() but we do not return the columns, just the
> count of them. Counting 1M columns still takes a long time. And find the
> 999,980th column will also take a long time, but if you know the name of the
> 999,980th column it will be mucho faster.
>
> Some experiments I did a while ago on query plans
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ - cass 1.0 will
> probably invalidate this.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/09/2011, at 6:01 PM, Boris Yen wrote:
>
>
>
> On Fri, Sep 23, 2011 at 12:28 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> Offsets have been discussed in previously. IIRC the main concerns were
>> either:
>>
>> There is no way to reliably count to start the offset, i.e. we do not lock
>> the row
>>
>
> In the new get_count function, cassandra does the internal paging in order
> to get the total count. Without locking the row,  the count could still be
> unreliable (someone might be deleting some columns while cassandra is
> counting the columns).
>
>
>>
>> Or performance related in, as there is not a reliable way to skip 10,000
>> columns other than counting 10,000 columns. With a start col we can search.
>>
>>
> I am just curious, basically "skip 10,000 columns to get the start column"
> can be done as what cassandra does for new get_count function (internal
> paging). I just can not think of a reason why it is doable for get_count but
> it can not be done for the offset.
>
> I know the result might not be reliable and the performance might be varied
> depends on the offset, but if cassandra can using internal paging to get
> count, it should be able the apply the same method to get the start column
> for the offset.
>
>
>> Cheers
>>
>>  -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 22/09/2011, at 8:50 PM, Boris Yen wrote:
>>
>> I was wondering if it is possible to use similar way as CASSANDRA-2894<https://issues.apache.org/jira/browse/CASSANDRA-2894> to
>> have the slice_predict support the offset concept? With the offset, it would
>> be much easier to implement the paging from the client side.
>>
>> Boris
>>
>> On Mon, Sep 19, 2011 at 9:45 PM, Jonathan Ellis <jb...@gmail.com>wrote:
>>
>>> Unfortunately no, because you don't know what the actual
>>> last-column-counted was.
>>>
>>> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aa...@thelastpickle.com>
>>> wrote:
>>> > get_count() supports the same predicate as get_slice. So you can
>>> implement
>>> > the paging yourself.
>>> > Cheers
>>> > -----------------
>>> > Aaron Morton
>>> > Freelance Cassandra Developer
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> > On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
>>> >
>>> >
>>> > On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch>
>>> wrote:
>>> >>
>>> >> The workaround for 0.7 is calling get_slice and count on client side.
>>> >> It's heavier, sure, but you will then be able to set start column
>>> >> accordingly.
>>> >
>>> > I was afraid of that :(
>>> > Will follow that method. Thanks.
>>> >>
>>> >>
>>> >> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
>>> >> > Thanks Aaron and Jake for the replies.
>>> >> > Any chance of a possible workaround to use for Cassandra 0.7?
>>> >> >
>>> >> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <
>>> aaron@thelastpickle.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Cool
>>> >> >> Thanks, A
>>> >> >> -----------------
>>> >> >> Aaron Morton
>>> >> >> Freelance Cassandra Developer
>>> >> >> @aaronmorton
>>> >> >> http://www.thelastpickle.com
>>> >> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>>> >> >>
>>> >> >> This is fixed in 1.0
>>> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
>>> >> >>
>>> >> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <
>>> mccloud35@gmail.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Hi everyone,
>>> >> >>> I noticed this line in the API docs,
>>> >> >>>
>>> >> >>> The method is not O(1). It takes all the columns from disk to
>>> >> >>> calculate
>>> >> >>> the answer. The only benefit of the method is that you do not need
>>> to
>>> >> >>> pull
>>> >> >>> all the columns over Thrift interface to count them.
>>> >> >>>
>>> >> >>> Does this mean if a row has a large number of columns calling this
>>> >> >>> method
>>> >> >>> might make it go OOM?
>>> >> >>> Thanks in advance.
>>> >> >>> --
>>> >> >>> Regards,
>>> >> >>>
>>> >> >>> Tharindu
>>> >> >>> blog: http://mackiemathew.com/
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> http://twitter.com/tjake
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Regards,
>>> >> >
>>> >> > Tharindu
>>> >> > blog: http://mackiemathew.com/
>>> >> >
>>> >
>>> >
>>> >
>>> > --
>>> > Regards,
>>> >
>>> > Tharindu
>>> > blog: http://mackiemathew.com/
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>>
>>
>
>

Re: Possibility of going OOM using get_count

Posted by aaron morton <aa...@thelastpickle.com>.

The changes in get_count() are designed to stop counts for very large rows running out of memory as they try to hold millions of columns in memory. 

So if you ask to count all the cols in a row with 1M cols, it will (by default) read the first 1024 columns, and then the next 1024 using the last column read as the first column for the next page. 

The important part is that it is actually reading the columns. Tombstones mean we do not know if a column should be a member of the result set for a query until it is read and reconciled with all the other versions of a column. e.g. 3 sstables have each have a value for a column, if one is a tombstone then the column may or may not be deleted. We do not know until all 3 column versions are reconciled.

get_count() is like get_slice() but we do not return the columns, just the count of them. Counting 1M columns still takes a long time. And find the 999,980th column will also take a long time, but if you know the name of the 999,980th column it will be mucho faster. 

Some experiments I did a while ago on query plans http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ - cass 1.0 will probably invalidate this.

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/09/2011, at 6:01 PM, Boris Yen wrote:

> 
> 
> On Fri, Sep 23, 2011 at 12:28 PM, aaron morton <aa...@thelastpickle.com> wrote:
> Offsets have been discussed in previously. IIRC the main concerns were either:
> 
> There is no way to reliably count to start the offset, i.e. we do not lock the row
> 
> In the new get_count function, cassandra does the internal paging in order to get the total count. Without locking the row,  the count could still be unreliable (someone might be deleting some columns while cassandra is counting the columns). 
>  
> 
> Or performance related in, as there is not a reliable way to skip 10,000 columns other than counting 10,000 columns. With a start col we can search. 
> 
> 
> I am just curious, basically "skip 10,000 columns to get the start column" can be done as what cassandra does for new get_count function (internal paging). I just can not think of a reason why it is doable for get_count but it can not be done for the offset. 
> 
> I know the result might not be reliable and the performance might be varied depends on the offset, but if cassandra can using internal paging to get count, it should be able the apply the same method to get the start column for the offset.
>  
> Cheers
>   
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 22/09/2011, at 8:50 PM, Boris Yen wrote:
> 
>> I was wondering if it is possible to use similar way as CASSANDRA-2894 to have the slice_predict support the offset concept? With the offset, it would be much easier to implement the paging from the client side.
>> 
>> Boris
>> 
>> On Mon, Sep 19, 2011 at 9:45 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Unfortunately no, because you don't know what the actual
>> last-column-counted was.
>> 
>> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> > get_count() supports the same predicate as get_slice. So you can implement
>> > the paging yourself.
>> > Cheers
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
>> >
>> >
>> > On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch> wrote:
>> >>
>> >> The workaround for 0.7 is calling get_slice and count on client side.
>> >> It's heavier, sure, but you will then be able to set start column
>> >> accordingly.
>> >
>> > I was afraid of that :(
>> > Will follow that method. Thanks.
>> >>
>> >>
>> >> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
>> >> > Thanks Aaron and Jake for the replies.
>> >> > Any chance of a possible workaround to use for Cassandra 0.7?
>> >> >
>> >> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>
>> >> > wrote:
>> >> >>
>> >> >> Cool
>> >> >> Thanks, A
>> >> >> -----------------
>> >> >> Aaron Morton
>> >> >> Freelance Cassandra Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>> >> >>
>> >> >> This is fixed in 1.0
>> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
>> >> >>
>> >> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi everyone,
>> >> >>> I noticed this line in the API docs,
>> >> >>>
>> >> >>> The method is not O(1). It takes all the columns from disk to
>> >> >>> calculate
>> >> >>> the answer. The only benefit of the method is that you do not need to
>> >> >>> pull
>> >> >>> all the columns over Thrift interface to count them.
>> >> >>>
>> >> >>> Does this mean if a row has a large number of columns calling this
>> >> >>> method
>> >> >>> might make it go OOM?
>> >> >>> Thanks in advance.
>> >> >>> --
>> >> >>> Regards,
>> >> >>>
>> >> >>> Tharindu
>> >> >>> blog: http://mackiemathew.com/
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> http://twitter.com/tjake
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards,
>> >> >
>> >> > Tharindu
>> >> > blog: http://mackiemathew.com/
>> >> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Tharindu
>> > blog: http://mackiemathew.com/
>> >
>> >
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>> 
> 
>

Re: Possibility of going OOM using get_count

Posted by Boris Yen <yu...@gmail.com>.

On Fri, Sep 23, 2011 at 12:28 PM, aaron morton <aa...@thelastpickle.com>wrote:

> Offsets have been discussed in previously. IIRC the main concerns were
> either:
>
> There is no way to reliably count to start the offset, i.e. we do not lock
> the row
>

In the new get_count function, cassandra does the internal paging in order
to get the total count. Without locking the row,  the count could still be
unreliable (someone might be deleting some columns while cassandra is
counting the columns).


>
> Or performance related in, as there is not a reliable way to skip 10,000
> columns other than counting 10,000 columns. With a start col we can search.
>
>
I am just curious, basically "skip 10,000 columns to get the start column"
can be done as what cassandra does for new get_count function (internal
paging). I just can not think of a reason why it is doable for get_count but
it can not be done for the offset.

I know the result might not be reliable and the performance might be varied
depends on the offset, but if cassandra can using internal paging to get
count, it should be able the apply the same method to get the start column
for the offset.


> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/09/2011, at 8:50 PM, Boris Yen wrote:
>
> I was wondering if it is possible to use similar way as CASSANDRA-2894<https://issues.apache.org/jira/browse/CASSANDRA-2894> to
> have the slice_predict support the offset concept? With the offset, it would
> be much easier to implement the paging from the client side.
>
> Boris
>
> On Mon, Sep 19, 2011 at 9:45 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> Unfortunately no, because you don't know what the actual
>> last-column-counted was.
>>
>> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>> > get_count() supports the same predicate as get_slice. So you can
>> implement
>> > the paging yourself.
>> > Cheers
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
>> >
>> >
>> > On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch>
>> wrote:
>> >>
>> >> The workaround for 0.7 is calling get_slice and count on client side.
>> >> It's heavier, sure, but you will then be able to set start column
>> >> accordingly.
>> >
>> > I was afraid of that :(
>> > Will follow that method. Thanks.
>> >>
>> >>
>> >> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
>> >> > Thanks Aaron and Jake for the replies.
>> >> > Any chance of a possible workaround to use for Cassandra 0.7?
>> >> >
>> >> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <
>> aaron@thelastpickle.com>
>> >> > wrote:
>> >> >>
>> >> >> Cool
>> >> >> Thanks, A
>> >> >> -----------------
>> >> >> Aaron Morton
>> >> >> Freelance Cassandra Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>> >> >>
>> >> >> This is fixed in 1.0
>> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
>> >> >>
>> >> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <
>> mccloud35@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi everyone,
>> >> >>> I noticed this line in the API docs,
>> >> >>>
>> >> >>> The method is not O(1). It takes all the columns from disk to
>> >> >>> calculate
>> >> >>> the answer. The only benefit of the method is that you do not need
>> to
>> >> >>> pull
>> >> >>> all the columns over Thrift interface to count them.
>> >> >>>
>> >> >>> Does this mean if a row has a large number of columns calling this
>> >> >>> method
>> >> >>> might make it go OOM?
>> >> >>> Thanks in advance.
>> >> >>> --
>> >> >>> Regards,
>> >> >>>
>> >> >>> Tharindu
>> >> >>> blog: http://mackiemathew.com/
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> http://twitter.com/tjake
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards,
>> >> >
>> >> > Tharindu
>> >> > blog: http://mackiemathew.com/
>> >> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Tharindu
>> > blog: http://mackiemathew.com/
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>

Re: Possibility of going OOM using get_count

Posted by aaron morton <aa...@thelastpickle.com>.

Offsets have been discussed in previously. IIRC the main concerns were either:

There is no way to reliably count to start the offset, i.e. we do not lock the row

Or performance related in, as there is not a reliable way to skip 10,000 columns other than counting 10,000 columns. With a start col we can search. 

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/09/2011, at 8:50 PM, Boris Yen wrote:

> I was wondering if it is possible to use similar way as CASSANDRA-2894 to have the slice_predict support the offset concept? With the offset, it would be much easier to implement the paging from the client side.
> 
> Boris
> 
> On Mon, Sep 19, 2011 at 9:45 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Unfortunately no, because you don't know what the actual
> last-column-counted was.
> 
> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aa...@thelastpickle.com> wrote:
> > get_count() supports the same predicate as get_slice. So you can implement
> > the paging yourself.
> > Cheers
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
> >
> >
> > On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch> wrote:
> >>
> >> The workaround for 0.7 is calling get_slice and count on client side.
> >> It's heavier, sure, but you will then be able to set start column
> >> accordingly.
> >
> > I was afraid of that :(
> > Will follow that method. Thanks.
> >>
> >>
> >> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
> >> > Thanks Aaron and Jake for the replies.
> >> > Any chance of a possible workaround to use for Cassandra 0.7?
> >> >
> >> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>
> >> > wrote:
> >> >>
> >> >> Cool
> >> >> Thanks, A
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Cassandra Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
> >> >>
> >> >> This is fixed in 1.0
> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
> >> >>
> >> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi everyone,
> >> >>> I noticed this line in the API docs,
> >> >>>
> >> >>> The method is not O(1). It takes all the columns from disk to
> >> >>> calculate
> >> >>> the answer. The only benefit of the method is that you do not need to
> >> >>> pull
> >> >>> all the columns over Thrift interface to count them.
> >> >>>
> >> >>> Does this mean if a row has a large number of columns calling this
> >> >>> method
> >> >>> might make it go OOM?
> >> >>> Thanks in advance.
> >> >>> --
> >> >>> Regards,
> >> >>>
> >> >>> Tharindu
> >> >>> blog: http://mackiemathew.com/
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> http://twitter.com/tjake
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > Tharindu
> >> > blog: http://mackiemathew.com/
> >> >
> >
> >
> >
> > --
> > Regards,
> >
> > Tharindu
> > blog: http://mackiemathew.com/
> >
> >
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Possibility of going OOM using get_count

Posted by Boris Yen <yu...@gmail.com>.

I was wondering if it is possible to use similar way as
CASSANDRA-2894<https://issues.apache.org/jira/browse/CASSANDRA-2894>
to
have the slice_predict support the offset concept? With the offset, it would
be much easier to implement the paging from the client side.

Boris

On Mon, Sep 19, 2011 at 9:45 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Unfortunately no, because you don't know what the actual
> last-column-counted was.
>
> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
> > get_count() supports the same predicate as get_slice. So you can
> implement
> > the paging yourself.
> > Cheers
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
> >
> >
> > On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch>
> wrote:
> >>
> >> The workaround for 0.7 is calling get_slice and count on client side.
> >> It's heavier, sure, but you will then be able to set start column
> >> accordingly.
> >
> > I was afraid of that :(
> > Will follow that method. Thanks.
> >>
> >>
> >> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
> >> > Thanks Aaron and Jake for the replies.
> >> > Any chance of a possible workaround to use for Cassandra 0.7?
> >> >
> >> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <
> aaron@thelastpickle.com>
> >> > wrote:
> >> >>
> >> >> Cool
> >> >> Thanks, A
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Cassandra Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
> >> >>
> >> >> This is fixed in 1.0
> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
> >> >>
> >> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <
> mccloud35@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi everyone,
> >> >>> I noticed this line in the API docs,
> >> >>>
> >> >>> The method is not O(1). It takes all the columns from disk to
> >> >>> calculate
> >> >>> the answer. The only benefit of the method is that you do not need
> to
> >> >>> pull
> >> >>> all the columns over Thrift interface to count them.
> >> >>>
> >> >>> Does this mean if a row has a large number of columns calling this
> >> >>> method
> >> >>> might make it go OOM?
> >> >>> Thanks in advance.
> >> >>> --
> >> >>> Regards,
> >> >>>
> >> >>> Tharindu
> >> >>> blog: http://mackiemathew.com/
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> http://twitter.com/tjake
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > Tharindu
> >> > blog: http://mackiemathew.com/
> >> >
> >
> >
> >
> > --
> > Regards,
> >
> > Tharindu
> > blog: http://mackiemathew.com/
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Possibility of going OOM using get_count

Posted by Tharindu Mathew <mc...@gmail.com>.

Yes, Aaron that self implemented paging is what I'm trying.

Jonathan, the last column read in the previous result fetched is the starting column of the next iteration. The end column remains constant. This is using slice ranges. Afaiu, that should work. 

Regards,

Tharindu

Sent from my iPhone

On Sep 19, 2011, at 7:15 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Unfortunately no, because you don't know what the actual
> last-column-counted was.
> 
> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> get_count() supports the same predicate as get_slice. So you can implement
>> the paging yourself.
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
>> 
>> 
>> On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch> wrote:
>>> 
>>> The workaround for 0.7 is calling get_slice and count on client side.
>>> It's heavier, sure, but you will then be able to set start column
>>> accordingly.
>> 
>> I was afraid of that :(
>> Will follow that method. Thanks.
>>> 
>>> 
>>> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
>>>> Thanks Aaron and Jake for the replies.
>>>> Any chance of a possible workaround to use for Cassandra 0.7?
>>>> 
>>>> On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>
>>>> wrote:
>>>>> 
>>>>> Cool
>>>>> Thanks, A
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Cassandra Developer
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>>>>> 
>>>>> This is fixed in 1.0
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2894
>>>>> 
>>>>> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> I noticed this line in the API docs,
>>>>>> 
>>>>>> The method is not O(1). It takes all the columns from disk to
>>>>>> calculate
>>>>>> the answer. The only benefit of the method is that you do not need to
>>>>>> pull
>>>>>> all the columns over Thrift interface to count them.
>>>>>> 
>>>>>> Does this mean if a row has a large number of columns calling this
>>>>>> method
>>>>>> might make it go OOM?
>>>>>> Thanks in advance.
>>>>>> --
>>>>>> Regards,
>>>>>> 
>>>>>> Tharindu
>>>>>> blog: http://mackiemathew.com/
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> http://twitter.com/tjake
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> 
>>>> Tharindu
>>>> blog: http://mackiemathew.com/
>>>> 
>> 
>> 
>> 
>> --
>> Regards,
>> 
>> Tharindu
>> blog: http://mackiemathew.com/
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Possibility of going OOM using get_count

Posted by Jonathan Ellis <jb...@gmail.com>.

Unfortunately no, because you don't know what the actual
last-column-counted was.

On Mon, Sep 19, 2011 at 4:25 AM, aaron morton <aa...@thelastpickle.com> wrote:
> get_count() supports the same predicate as get_slice. So you can implement
> the paging yourself.
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
>
>
> On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch> wrote:
>>
>> The workaround for 0.7 is calling get_slice and count on client side.
>> It's heavier, sure, but you will then be able to set start column
>> accordingly.
>
> I was afraid of that :(
> Will follow that method. Thanks.
>>
>>
>> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
>> > Thanks Aaron and Jake for the replies.
>> > Any chance of a possible workaround to use for Cassandra 0.7?
>> >
>> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>
>> > wrote:
>> >>
>> >> Cool
>> >> Thanks, A
>> >> -----------------
>> >> Aaron Morton
>> >> Freelance Cassandra Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>> >>
>> >> This is fixed in 1.0
>> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
>> >>
>> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi everyone,
>> >>> I noticed this line in the API docs,
>> >>>
>> >>> The method is not O(1). It takes all the columns from disk to
>> >>> calculate
>> >>> the answer. The only benefit of the method is that you do not need to
>> >>> pull
>> >>> all the columns over Thrift interface to count them.
>> >>>
>> >>> Does this mean if a row has a large number of columns calling this
>> >>> method
>> >>> might make it go OOM?
>> >>> Thanks in advance.
>> >>> --
>> >>> Regards,
>> >>>
>> >>> Tharindu
>> >>> blog: http://mackiemathew.com/
>> >>
>> >>
>> >>
>> >> --
>> >> http://twitter.com/tjake
>> >>
>> >
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Tharindu
>> > blog: http://mackiemathew.com/
>> >
>
>
>
> --
> Regards,
>
> Tharindu
> blog: http://mackiemathew.com/
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Possibility of going OOM using get_count

Posted by aaron morton <aa...@thelastpickle.com>.

get_count() supports the same predicate as get_slice. So you can implement the paging yourself. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:

> 
> 
> On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch> wrote:
> The workaround for 0.7 is calling get_slice and count on client side.
> It's heavier, sure, but you will then be able to set start column
> accordingly.
> 
> I was afraid of that :(
> 
> Will follow that method. Thanks. 
> 
> 
> 
> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
> > Thanks Aaron and Jake for the replies.
> > Any chance of a possible workaround to use for Cassandra 0.7?
> >
> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>
> > wrote:
> >>
> >> Cool
> >> Thanks, A
> >> -----------------
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
> >>
> >> This is fixed in 1.0
> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
> >>
> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>
> >> wrote:
> >>>
> >>> Hi everyone,
> >>> I noticed this line in the API docs,
> >>>
> >>> The method is not O(1). It takes all the columns from disk to calculate
> >>> the answer. The only benefit of the method is that you do not need to pull
> >>> all the columns over Thrift interface to count them.
> >>>
> >>> Does this mean if a row has a large number of columns calling this method
> >>> might make it go OOM?
> >>> Thanks in advance.
> >>> --
> >>> Regards,
> >>>
> >>> Tharindu
> >>> blog: http://mackiemathew.com/
> >>
> >>
> >>
> >> --
> >> http://twitter.com/tjake
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Tharindu
> > blog: http://mackiemathew.com/
> >
> 
> 
> 
> -- 
> Regards,
> 
> Tharindu
> 
> blog: http://mackiemathew.com/
>

Re: Possibility of going OOM using get_count

Posted by Tharindu Mathew <mc...@gmail.com>.

On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud <be...@noisette.ch> wrote:

> The workaround for 0.7 is calling get_slice and count on client side.
> It's heavier, sure, but you will then be able to set start column
> accordingly.
>

I was afraid of that :(

Will follow that method. Thanks.

>
>
>
> 2011/9/19 Tharindu Mathew <mc...@gmail.com>:
> > Thanks Aaron and Jake for the replies.
> > Any chance of a possible workaround to use for Cassandra 0.7?
> >
> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>
> > wrote:
> >>
> >> Cool
> >> Thanks, A
> >> -----------------
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
> >>
> >> This is fixed in 1.0
> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
> >>
> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>
> >> wrote:
> >>>
> >>> Hi everyone,
> >>> I noticed this line in the API docs,
> >>>
> >>> The method is not O(1). It takes all the columns from disk to calculate
> >>> the answer. The only benefit of the method is that you do not need to
> pull
> >>> all the columns over Thrift interface to count them.
> >>>
> >>> Does this mean if a row has a large number of columns calling this
> method
> >>> might make it go OOM?
> >>> Thanks in advance.
> >>> --
> >>> Regards,
> >>>
> >>> Tharindu
> >>> blog: http://mackiemathew.com/
> >>
> >>
> >>
> >> --
> >> http://twitter.com/tjake
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Tharindu
> > blog: http://mackiemathew.com/
> >
>



-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Possibility of going OOM using get_count

Posted by Benoit Perroud <be...@noisette.ch>.

The workaround for 0.7 is calling get_slice and count on client side.
It's heavier, sure, but you will then be able to set start column
accordingly.



2011/9/19 Tharindu Mathew <mc...@gmail.com>:
> Thanks Aaron and Jake for the replies.
> Any chance of a possible workaround to use for Cassandra 0.7?
>
> On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>>
>> Cool
>> Thanks, A
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>>
>> This is fixed in 1.0
>> https://issues.apache.org/jira/browse/CASSANDRA-2894
>>
>> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>
>> wrote:
>>>
>>> Hi everyone,
>>> I noticed this line in the API docs,
>>>
>>> The method is not O(1). It takes all the columns from disk to calculate
>>> the answer. The only benefit of the method is that you do not need to pull
>>> all the columns over Thrift interface to count them.
>>>
>>> Does this mean if a row has a large number of columns calling this method
>>> might make it go OOM?
>>> Thanks in advance.
>>> --
>>> Regards,
>>>
>>> Tharindu
>>> blog: http://mackiemathew.com/
>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> Regards,
>
> Tharindu
> blog: http://mackiemathew.com/
>

Re: Possibility of going OOM using get_count

Posted by Tharindu Mathew <mc...@gmail.com>.

Thanks Aaron and Jake for the replies.

Any chance of a possible workaround to use for Cassandra 0.7?

On Mon, Sep 19, 2011 at 3:48 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Cool
>
> Thanks, A
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>
> This is fixed in 1.0
> https://issues.apache.org/jira/browse/CASSANDRA-2894
>
>
> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>wrote:
>
>> Hi everyone,
>>
>> I noticed this line in the API docs,
>>
>> The method is not O(1). It takes all the columns from disk to calculate
>> the answer. The only benefit of the method is that you do not need to pull
>> all the columns over Thrift interface to count them.
>> Does this mean if a row has a large number of columns calling this method
>> might make it go OOM?
>>
>> Thanks in advance.
>>
>> --
>> Regards,
>>
>> Tharindu
>>
>> blog: http://mackiemathew.com/
>>
>>
>
>
> --
> http://twitter.com/tjake
>
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Possibility of going OOM using get_count

Posted by aaron morton <aa...@thelastpickle.com>.

Cool 

Thanks, A

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2011, at 9:55 AM, Jake Luciani wrote:

> This is fixed in 1.0
> https://issues.apache.org/jira/browse/CASSANDRA-2894
> 
> 
> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com> wrote:
> Hi everyone,
> 
> I noticed this line in the API docs,
> The method is not O(1). It takes all the columns from disk to calculate the answer. The only benefit of the method is that you do not need to pull all the columns over Thrift interface to count them.
> 
> Does this mean if a row has a large number of columns calling this method might make it go OOM?
> 
> Thanks in advance.
> 
> -- 
> Regards,
> 
> Tharindu
> 
> blog: http://mackiemathew.com/
> 
> 
> 
> 
> -- 
> http://twitter.com/tjake

Re: Possibility of going OOM using get_count

Posted by Jake Luciani <ja...@gmail.com>.

This is fixed in 1.0
https://issues.apache.org/jira/browse/CASSANDRA-2894


On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew <mc...@gmail.com>wrote:

> Hi everyone,
>
> I noticed this line in the API docs,
>
> The method is not O(1). It takes all the columns from disk to calculate the
> answer. The only benefit of the method is that you do not need to pull all
> the columns over Thrift interface to count them.
> Does this mean if a row has a large number of columns calling this method
> might make it go OOM?
>
> Thanks in advance.
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>


-- 
http://twitter.com/tjake

Re: Possibility of going OOM using get_count

Posted by aaron morton <aa...@thelastpickle.com>.

yes. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2011, at 7:16 AM, Tharindu Mathew wrote:

> Hi everyone,
> 
> I noticed this line in the API docs,
> The method is not O(1). It takes all the columns from disk to calculate the answer. The only benefit of the method is that you do not need to pull all the columns over Thrift interface to count them.
> 
> Does this mean if a row has a large number of columns calling this method might make it go OOM?
> 
> Thanks in advance.
> 
> -- 
> Regards,
> 
> Tharindu
> 
> blog: http://mackiemathew.com/
>