You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2010/07/08 01:07:17 UTC

Re: performance with a "large" number of supercolumns/columns

Hi Terje,

Sorry to not get to this sooner.  Are you still looking into this?

On Tue, Jun 22, 2010 at 12:47 PM, Terje Marthinussen
<tm...@gmail.com> wrote:
> Hi,
>
> I was looking a bit on a case we have with columnfamily which has 439k
> supercolumns, each supercolumn with ~3 columns each so a total of some 1.3
> million objects in total.
>
> This takes about 9 second to read with thrift on first access, 4-5 second on
> second access.
>
> I took a little closer look at this, I noticed that roughly half of this
> time was spend in cassandra.
>
> I will look more at this, but I thought I would just ask people here as it
> could be that someone already had good explanations...
>
> Most of the time is spent here
>
>    public void collectReducedColumns(IColumnContainer container,
> Iterator<IColumn> reducedColumns, int gcBefore)
>    {
>        int liveColumns = 0;
>        AbstractType comparator = container.getComparator();
>
>        while (reducedColumns.hasNext())
>        {
>            if (liveColumns >= count)
>                break;
>
>            IColumn column = reducedColumns.next();
>            if (logger.isDebugEnabled())
>                logger.debug(String.format("collecting %s of %s: %s",
>                                           liveColumns, count,
> column.getString(comparator)));
>
>            if (finish.length > 0
>                && ((!reversed && comparator.compare(column.name(),
> finish) > 0))
>                    || (reversed && comparator.compare(column.name(),
> finish) < 0))
>                break;
>
>            // only count live columns towards the `count` criteria
>            if (!column.isMarkedForDelete()
>                && (!container.isMarkedForDelete()
>                    || (ClockRelationship.GREATER_THAN ==
> column.mostRecentLiveChangeAt().compare(container.getMarkedForDeleteAt()))))
>            {
>                liveColumns++;
>            }
>
>            // but we need to add all non-gc-able columns to the
> result for read repair:
>            if (QueryFilter.isRelevant(column, container, gcBefore))
>                container.addColumn(column);
>        }
>    }
>
> Adding some time measuring print revealed a few interesting this.
> 1. First time I request the row (I request the entire row in this case),
> collectReducedColumns() is called twice. I have not had time to understand
> why yet, but there is one difference between the calls. All columns are
> returned both times, but the first call is done with MAXINT as "count" while
> the second call has the maxcount actually requested by the client as
> "count". The first call takes about 3.7 seconds to process, the second about
> 1.7 secs. Whatever reason, I think we should be able to remove one of these
> calls?
>
> 2. Almost half the time is spent in "container.addColumn". This is probably
> no big surprise as there is a lot of code hidden "down there". I am however
> very curious if it could not be significantly optimized, especially in the
> case where you have predicates which cases all columns to be included. That
> said, I have not manage to read enough cassandra code to understand
> tombstones or all the other things which is going on in that part of the
> code.
>
> 3. A bit more of a surprise, most of the remaining 50% of the time seems to
> occur at while (reducedColumns.hasNext())
> That is, save system.nanoTime() at the end of the loop and after hasNext and
> sum up and it accounts for almost 50% of the total time. That seems quite
> weird to me.
>
> I will dig more, but I was curious if someone had answers already.
>
> Best regards,
> Terje
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: performance with a "large" number of supercolumns/columns

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Jul 8, 2010 at 12:18 AM, Terje Marthinussen
<tm...@gmail.com> wrote:
> 1.  There is just too much code and too many "layers" involved when hasNext
> is called. I suspect this requires a re-design and the
> google/common-collections may have to be thrown out. This would seem to be a
> pretty critical area of Cassandra code for performance, so I suspect the
> cost of making  functionality specifically tuned for Cassandra should be
> worth it.

That might be interesting.  I've looked at the apache collated
iterator code [the main thing we use commons-collections for] and it
looks pretty sloppy.  Google collections (guava in trunk) otoh is
probably better than what I could come up with.

Kevin from guava suggests they will add support for collated iteration
"eventually."  See
http://code.google.com/p/guava-libraries/issues/detail?id=119 and the
linked 220.

> 2. From what I understand, Columns from all SS/MemTables are merged before
> considering tombstones and applying the slice predicates.

Partly -- for slicing, we deserialize an index block at a time (from
each sstable) until the the slice predicate is satisfied (getNextBlock
in ColumnGroupReader).  For reversed queries this is unavoidable, but
it could be optimized better for forward queries.

For column name predicates we deserialize exactly the requested columns.

It should be possible to optimize both better wrt tombstones (instead
of throwing out "uninteresting" columns in a final pass, pass in the
deletion time they have to "beat" to be relevant to the
IColumnIterators).

> While this may make things somewhat hairy codewise, JVM performance/memory
> consumption tend to not be happy in scenarios with heavy object creation,
> and we get a lot of objects here.

Agreed.

> 3. It would seem like there might be good possibilities to do a fair bit of
> optimization for the case where a complete set of columns are fetched
> (slices with null for start/end predicates), but I am not currently sure how
> much time it would save and if it would be worth the work and added
> complexity to the

As I commented on
https://issues.apache.org/jira/browse/CASSANDRA-1207, one piece of
low-hanging fruit would be to allow skipping either the column index
or bloom filter for name-based or slice-based queries, respectively.

> I may make an attempt at 1. but I don't feel entirely like I understand
> enough of the Cassandra code yet to do any of the above at the moment so it
> may take a little while.

Feel free to catch us on IRC in #cassandra-dev if you want quicker answers. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: performance with a "large" number of supercolumns/columns

Posted by Terje Marthinussen <tm...@gmail.com>.

Hi,

yes, I have been looking a bit on this, but not as much as I wanted to.

I kind of forgot about the whole double call on first access (probably
something related to caching?) as even fixing that, the performance is not
good enough for what I need.

The hasNext() call causes interactions with both apache common-collections
and google-collection and simply speaking, the amount of code which ends up
being executed for the hasNext()/next() part of this loop is a bit
ridiculous.

I started to optimize some code in common-collections, got a 10-20%
improvement (for my specific test case), but as I started understanding both
the collection code and Cassandra code better  it started dawning on me
that:

1.  There is just too much code and too many "layers" involved when hasNext
is called. I suspect this requires a re-design and the
google/common-collections may have to be thrown out. This would seem to be a
pretty critical area of Cassandra code for performance, so I suspect the
cost of making  functionality specifically tuned for Cassandra should be
worth it.

2. From what I understand, Columns from all SS/MemTables are merged before
considering tombstones and applying the slice predicates. To reduce
deserialization and Column object creation, maybe some of this filtering and
column merging could/should be done before Columns objects are even created?

While this may make things somewhat hairy codewise, JVM performance/memory
consumption tend to not be happy in scenarios with heavy object creation,
and we get a lot of objects here.

3. It would seem like there might be good possibilities to do a fair bit of
optimization for the case where a complete set of columns are fetched
(slices with null for start/end predicates), but I am not currently sure how
much time it would save and if it would be worth the work and added
complexity to the code.

I may make an attempt at 1. but I don't feel entirely like I understand
enough of the Cassandra code yet to do any of the above at the moment so it
may take a little while.

Regards,
Terje

On Thu, Jul 8, 2010 at 8:07 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi Terje,
>
> Sorry to not get to this sooner.  Are you still looking into this?
>
> On Tue, Jun 22, 2010 at 12:47 PM, Terje Marthinussen
> <tm...@gmail.com> wrote:
> > Hi,
> >
> > I was looking a bit on a case we have with columnfamily which has 439k
> > supercolumns, each supercolumn with ~3 columns each so a total of some
> 1.3
> > million objects in total.
> >
> > This takes about 9 second to read with thrift on first access, 4-5 second
> on
> > second access.
> >
> > I took a little closer look at this, I noticed that roughly half of this
> > time was spend in cassandra.
> >
> > I will look more at this, but I thought I would just ask people here as
> it
> > could be that someone already had good explanations...
> >
> > Most of the time is spent here
> >
> >    public void collectReducedColumns(IColumnContainer container,
> > Iterator<IColumn> reducedColumns, int gcBefore)
> >    {
> >        int liveColumns = 0;
> >        AbstractType comparator = container.getComparator();
> >
> >        while (reducedColumns.hasNext())
> >        {
> >            if (liveColumns >= count)
> >                break;
> >
> >            IColumn column = reducedColumns.next();
> >            if (logger.isDebugEnabled())
> >                logger.debug(String.format("collecting %s of %s: %s",
> >                                           liveColumns, count,
> > column.getString(comparator)));
> >
> >            if (finish.length > 0
> >                && ((!reversed && comparator.compare(column.name(),
> > finish) > 0))
> >                    || (reversed && comparator.compare(column.name(),
> > finish) < 0))
> >                break;
> >
> >            // only count live columns towards the `count` criteria
> >            if (!column.isMarkedForDelete()
> >                && (!container.isMarkedForDelete()
> >                    || (ClockRelationship.GREATER_THAN ==
> >
> column.mostRecentLiveChangeAt().compare(container.getMarkedForDeleteAt()))))
> >            {
> >                liveColumns++;
> >            }
> >
> >            // but we need to add all non-gc-able columns to the
> > result for read repair:
> >            if (QueryFilter.isRelevant(column, container, gcBefore))
> >                container.addColumn(column);
> >        }
> >    }
> >
> > Adding some time measuring print revealed a few interesting this.
> > 1. First time I request the row (I request the entire row in this case),
> > collectReducedColumns() is called twice. I have not had time to
> understand
> > why yet, but there is one difference between the calls. All columns are
> > returned both times, but the first call is done with MAXINT as "count"
> while
> > the second call has the maxcount actually requested by the client as
> > "count". The first call takes about 3.7 seconds to process, the second
> about
> > 1.7 secs. Whatever reason, I think we should be able to remove one of
> these
> > calls?
> >
> > 2. Almost half the time is spent in "container.addColumn". This is
> probably
> > no big surprise as there is a lot of code hidden "down there". I am
> however
> > very curious if it could not be significantly optimized, especially in
> the
> > case where you have predicates which cases all columns to be included.
> That
> > said, I have not manage to read enough cassandra code to understand
> > tombstones or all the other things which is going on in that part of the
> > code.
> >
> > 3. A bit more of a surprise, most of the remaining 50% of the time seems
> to
> > occur at while (reducedColumns.hasNext())
> > That is, save system.nanoTime() at the end of the loop and after hasNext
> and
> > sum up and it accounts for almost 50% of the total time. That seems quite
> > weird to me.
> >
> > I will dig more, but I was curious if someone had answers already.
> >
> > Best regards,
> > Terje
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>