You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by pdread <pa...@siginttech.com> on 2014/09/11 18:06:52 UTC

Compaction slowing queries

We have 100+ tablet servers, approx 860 tablets/server, ingest approx 300K+
docs/day, the problem recently started that queries during a minor or major
compaction are taking about 100+ seconds as opposed to about 2 seconds when
no compaction. Everyone on the cluster is effected, mapreduce jobs and batch
scanners.

One table has as many as 65K tablets.

In the hopes of reducing the compactions yesterday we changed on 2 tables
that appeared to cause most of the compactions:

compaction.ratio from 3 to 5
table.file.max from 15 to 45
split.threshold from 725M to 2G.

tservers are set to 3G, top shows 6G res and 7G virt for the one I checked. 

The odd things is we expected the number of tablets to change and they did
not. The only thing that happened was the number of compactions went up but
the duration of the compactions went down by about half. Queries in off
times did not seem to change.

One more thing, we only store docs < 64M in accumulo, otherwise they are
written directly to hdfs.

The question would be, is there a way to reduce the compaction frequency and
or duration?

Thanks in advance.

Paul



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Compaction-slowing-queries-tp11278.html
Sent from the Users mailing list archive at Nabble.com.

Re: Compaction slowing queries

Posted by Adam Fuchs <af...@apache.org>.

You can change compression codecs at any time on a per-table basis. This
only affects how new files are written. Existing files will still be read
the same way. See the table.file.compress.type parameter.

One caveat is that you need to make sure your codec is supported before
switching to it or compactions will start failing. You might want to try it
on a test table first, testing insert and flush operations after
configuring the coodec.

Adam
On Sep 11, 2014 1:00 PM, "pdread" <pa...@siginttech.com> wrote:

> Adam
>
>
> Quick question if I may, your comment about compression util #2, are these
> other compression tools compatible with gzip? We have 10s of millions of
> docs already loaded and of course do not want to reload.
>
> I will try #1, say 6 threads for minor and 12 threads for major. I checked
> and the servers are 24 CPUs and the average load time is nil.
>
> Your suggestions #3 will not work since we would have to re-index all of
> our
> docs which is not going to work.
>
> Thanks
>
> Paul
>
>
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Compaction-slowing-queries-tp11278p11280.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: Compaction slowing queries

Posted by pdread <pa...@siginttech.com>.

Adam


Quick question if I may, your comment about compression util #2, are these
other compression tools compatible with gzip? We have 10s of millions of
docs already loaded and of course do not want to reload.

I will try #1, say 6 threads for minor and 12 threads for major. I checked
and the servers are 24 CPUs and the average load time is nil.

Your suggestions #3 will not work since we would have to re-index all of our
docs which is not going to work.

Thanks

Paul





--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Compaction-slowing-queries-tp11278p11280.html
Sent from the Users mailing list archive at Nabble.com.

Re: Compaction slowing queries

Posted by Adam Fuchs <af...@apache.org>.

Paul,

Here are a few suggestions:

1. Reduce the number of concurrent compaction threads
(tserver.compaction.major.concurrent.max, and
tserver.compaction.minor.concurrent.max). You probably want to lean
towards twice as many major compaction threads as minor, but that
somewhat depends on how bursty your ingest rate is. The total number
of threads should leave plenty of cores for query processing.

2. Look into using a different compression codec. Snappy or LZz4 can
support a much higher throughput that the default of gzip, although
the compression ratio will not be as good.

3. Consider a key choice that limits the number of actively ingesting
tablets. Writing across all ~100k tablets means they will all be
actively compacting, but if you can arrange your keys such that only
~1k tablets are being actively written to then you can significantly
cut your expected write amplification (i.e. number of major
compactions needed). This is because minor compactions will be larger
and you'll spend proportionally more time writing into smaller
tablets.

Cheers,
Adam

On Thu, Sep 11, 2014 at 12:06 PM, pdread <pa...@siginttech.com> wrote:
>
> We have 100+ tablet servers, approx 860 tablets/server, ingest approx 300K+
> docs/day, the problem recently started that queries during a minor or major
> compaction are taking about 100+ seconds as opposed to about 2 seconds when
> no compaction. Everyone on the cluster is effected, mapreduce jobs and batch
> scanners.
>
> One table has as many as 65K tablets.
>
> In the hopes of reducing the compactions yesterday we changed on 2 tables
> that appeared to cause most of the compactions:
>
> compaction.ratio from 3 to 5
> table.file.max from 15 to 45
> split.threshold from 725M to 2G.
>
> tservers are set to 3G, top shows 6G res and 7G virt for the one I checked.
>
> The odd things is we expected the number of tablets to change and they did
> not. The only thing that happened was the number of compactions went up but
> the duration of the compactions went down by about half. Queries in off
> times did not seem to change.
>
> One more thing, we only store docs < 64M in accumulo, otherwise they are
> written directly to hdfs.
>
> The question would be, is there a way to reduce the compaction frequency and
> or duration?
>
> Thanks in advance.
>
> Paul
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Compaction-slowing-queries-tp11278.html
> Sent from the Users mailing list archive at Nabble.com.