You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Hiller, Dean" <De...@nrel.gov> on 2013/07/03 15:02:40 UTC

column sort order and reversed sort performance question

We loaded 5 million columns into a single row and when accessing the first 30k and last 30k columns we saw no performance difference.  We tried just loading 2 rows from the beginning and end and saw no performance difference.  I am sure reverse sort is there for a reason though.  In what context do you actually see a performance difference with reverse sort???

5 million columns took us a while to load into a single row(in fact, a bit slower than loading 10 million columns into 100 rows took on 4 node cluster) and we could load more if needed, but we don't anticipate going beyond 10 million columns in a row at this point.

Thanks,
Dean

Re: column sort order and reversed sort performance question

Posted by sankalp kohli <ko...@gmail.com>.

One of the reasons of using reverse order is to skip the tombstones while
doing a range query. Here is an example.
*

Lets say we want to read all the data which is between 10 minutes old upto
60 minute old. If the data is stored from old to new in an sstable, then we
have to go over all the tombstones before we get any column which is live.
All the lazy iterators on the column will start with giving columns which
are 60 minutes old, 59 minutes old and so on. They all will keep getting
tombstones and we will not find any live column till we reach 11 or 12
minute. SO this way we have to go over all the data and tombstones between
60 and 12(if non deleted columns are found at 12 minute).

Whereas, if we store the data from new to old, when we iterate over
columns, we will get newer columns first which will have been tombstones
and we will find live columns which we can return.

But if there is less columns than we want, then the way we store data does
not matter. Because we anyway have to go over all the columns from 10 to 60
minutes.

*

On Wed, Jul 3, 2013 at 10:11 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jul 3, 2013 at 6:02 AM, Hiller, Dean <De...@nrel.gov> wrote:
> >
> > We loaded 5 million columns into a single row and when accessing the
> first 30k and last 30k columns we saw no performance difference.  We tried
> just loading 2 rows from the beginning and end and saw no performance
> difference.  I am sure reverse sort is there for a reason though.  In what
> context do you actually see a performance difference with reverse sort???
>
>
> http://thelastpickle.com/2011/10/03/Reverse-Comparators/
> "
> When a query does not specify a start column (and does not specify
> reversed) the server can just start reading columns from the start without
> having to worry about finding the right place to start. This is exactly
> what we can do for the Descending CF.
>
> For the regular Ascending CF we need to specify reversed, so the server
> must read the row index and work out which column is column count from the
> end of the row.
>
> There is no comparison really.
> "
>
> =Rob
>
>

Re: column sort order and reversed sort performance question

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Jul 3, 2013 at 6:02 AM, Hiller, Dean <De...@nrel.gov> wrote:
>
> We loaded 5 million columns into a single row and when accessing the
first 30k and last 30k columns we saw no performance difference.  We tried
just loading 2 rows from the beginning and end and saw no performance
difference.  I am sure reverse sort is there for a reason though.  In what
context do you actually see a performance difference with reverse sort???

http://thelastpickle.com/2011/10/03/Reverse-Comparators/
"
When a query does not specify a start column (and does not specify
reversed) the server can just start reading columns from the start without
having to worry about finding the right place to start. This is exactly
what we can do for the Descending CF.

For the regular Ascending CF we need to specify reversed, so the server
must read the row index and work out which column is column count from the
end of the row.

There is no comparison really.
"

=Rob