You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Girish Joshi <gj...@groupon.com.INVALID> on 2015/11/01 19:24:27 UTC

Slow reads coinciding with higher compaction time avg time

Hello

In my hbase cluster, I observe the following consistently happening over
several days:-

- There is a spike in compaction time avg time metric. At the same time the
swap bytes in and swap bytes out also have higher value.
- Around the same time, I see the FS PRead and FS Read latencies and client
latencies doing random reads increase.

My hbase cluster consisting of 16 nodes and setup with a replication to
another cluster of 16 nodes has the following workload:-

- There are around 4 tables which have lot of write activity(around 500k
per second writes on m1/m15 moving average). 2 of these tables have atomic
counter columns keeping track of some analytics data and being incremented
with every write.

- There are 2 tables which receive bulk uploaded data periodically(around
once a day)

- We expect reads at around 100k per second mainly from tables which have
bulk upload data and the one which has counter columns. The read
latencies(p99) spike up to around 1000-5000 ms when the above compaction
time avg time metric increases. In other times, they are below 100 ms.

I have set the hbase.hregion.majorcompaction to 0 on region servers; I plan
to set it to 0 on master nodes too so that I can take out the possibility
of time triggered major compactions being the problem. But I suspect there
are lot of minor compactions and those leading to major compactions
happening at the time of spikes.

*Any suggestions on how to avoid this situation of read latency spikes and
have better read performance?*

Thanks,

Girish.

Re: Slow reads coinciding with higher compaction time avg time

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

How much memory do you have on this server, what is running and how much
did you give to what?

Also, what is your swapiness level?

http://askubuntu.com/questions/103915/how-do-i-configure-swappiness

2015-11-02 19:20 GMT-05:00 Girish Joshi <gj...@groupon.com.invalid>:

> Thanks. Do you have any specific suggestions to avoid swapping during hbase
> compactions.
>
> Thanks,
>
> Girish.
>
> On Sun, Nov 1, 2015 at 6:25 PM, Vladimir Rodionov <vl...@gmail.com>
> wrote:
>
> > >>- There is a spike in compaction time avg time metric. At the same time
> > the
> > >>swap bytes in and swap bytes out also have higher value.
> >
> > Swapping is bad. You have to avoid it.
> >
> > -Vlad
> >
> > On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi <gjoshi@groupon.com.invalid
> >
> > wrote:
> >
> > > Hello
> > >
> > > In my hbase cluster, I observe the following consistently happening
> over
> > > several days:-
> > >
> > > - There is a spike in compaction time avg time metric. At the same time
> > the
> > > swap bytes in and swap bytes out also have higher value.
> > > - Around the same time, I see the FS PRead and FS Read latencies and
> > client
> > > latencies doing random reads increase.
> > >
> > > My hbase cluster consisting of 16 nodes and setup with a replication to
> > > another cluster of 16 nodes has the following workload:-
> > >
> > > - There are around 4 tables which have lot of write activity(around
> 500k
> > > per second writes on m1/m15 moving average). 2 of these tables have
> > atomic
> > > counter columns keeping track of some analytics data and being
> > incremented
> > > with every write.
> > >
> > > - There are 2 tables which receive bulk uploaded data
> periodically(around
> > > once a day)
> > >
> > > - We expect reads at around 100k per second mainly from tables which
> have
> > > bulk upload data and the one which has counter columns. The read
> > > latencies(p99) spike up to around 1000-5000 ms when the above
> compaction
> > > time avg time metric increases. In other times, they are below 100 ms.
> > >
> > > I have set the hbase.hregion.majorcompaction to 0 on region servers; I
> > plan
> > > to set it to 0 on master nodes too so that I can take out the
> possibility
> > > of time triggered major compactions being the problem. But I suspect
> > there
> > > are lot of minor compactions and those leading to major compactions
> > > happening at the time of spikes.
> > >
> > > *Any suggestions on how to avoid this situation of read latency spikes
> > and
> > > have better read performance?*
> > >
> > > Thanks,
> > >
> > > Girish.
> > >
> >
>

Re: Slow reads coinciding with higher compaction time avg time

Posted by Girish Joshi <gj...@groupon.com.INVALID>.

Thanks everyone. I have 192G RAM on hbase machines in the cluster and out
of it around 100-120G is used in user processes and rest of it for caching.
The swap graphs I have show me that the swapping happens at a rate of 100
million kb/second.

I have disabled swappiness on my machines. but I still have the same issue
of latency spikes during swapping.

Thanks,

Girish.

On Mon, Nov 2, 2015 at 8:07 PM, Yu Li <ca...@gmail.com> wrote:

> Second Vlad. Swapping is necessary in many situations, but as JVM does not
> behave well under swapping, HBase may run into trouble if swapped. Search
> for "swapping" in hbase book <http://hbase.apache.org/book.html> and you
> could see the same suggestion.
>
> Best Regards,
> Yu
>
> On 3 November 2015 at 11:59, Vladimir Rodionov <vl...@gmail.com>
> wrote:
>
> > >> Do you have any specific suggestions to avoid swapping during hbase
> > compactions.
> >
> > You can google "disable swap on linux", make sure that you do
> > not overprovision your system's RAM (too many processes running and
> > consuming all physical memory), monitor swap usage with vmstat.
> >
> > -Vlad
> >
> > On Mon, Nov 2, 2015 at 4:20 PM, Girish Joshi <gjoshi@groupon.com.invalid
> >
> > wrote:
> >
> > > Thanks. Do you have any specific suggestions to avoid swapping during
> > hbase
> > > compactions.
> > >
> > > Thanks,
> > >
> > > Girish.
> > >
> > > On Sun, Nov 1, 2015 at 6:25 PM, Vladimir Rodionov <
> > vladrodionov@gmail.com>
> > > wrote:
> > >
> > > > >>- There is a spike in compaction time avg time metric. At the same
> > time
> > > > the
> > > > >>swap bytes in and swap bytes out also have higher value.
> > > >
> > > > Swapping is bad. You have to avoid it.
> > > >
> > > > -Vlad
> > > >
> > > > On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi
> > <gjoshi@groupon.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hello
> > > > >
> > > > > In my hbase cluster, I observe the following consistently happening
> > > over
> > > > > several days:-
> > > > >
> > > > > - There is a spike in compaction time avg time metric. At the same
> > time
> > > > the
> > > > > swap bytes in and swap bytes out also have higher value.
> > > > > - Around the same time, I see the FS PRead and FS Read latencies
> and
> > > > client
> > > > > latencies doing random reads increase.
> > > > >
> > > > > My hbase cluster consisting of 16 nodes and setup with a
> replication
> > to
> > > > > another cluster of 16 nodes has the following workload:-
> > > > >
> > > > > - There are around 4 tables which have lot of write activity(around
> > > 500k
> > > > > per second writes on m1/m15 moving average). 2 of these tables have
> > > > atomic
> > > > > counter columns keeping track of some analytics data and being
> > > > incremented
> > > > > with every write.
> > > > >
> > > > > - There are 2 tables which receive bulk uploaded data
> > > periodically(around
> > > > > once a day)
> > > > >
> > > > > - We expect reads at around 100k per second mainly from tables
> which
> > > have
> > > > > bulk upload data and the one which has counter columns. The read
> > > > > latencies(p99) spike up to around 1000-5000 ms when the above
> > > compaction
> > > > > time avg time metric increases. In other times, they are below 100
> > ms.
> > > > >
> > > > > I have set the hbase.hregion.majorcompaction to 0 on region
> servers;
> > I
> > > > plan
> > > > > to set it to 0 on master nodes too so that I can take out the
> > > possibility
> > > > > of time triggered major compactions being the problem. But I
> suspect
> > > > there
> > > > > are lot of minor compactions and those leading to major compactions
> > > > > happening at the time of spikes.
> > > > >
> > > > > *Any suggestions on how to avoid this situation of read latency
> > spikes
> > > > and
> > > > > have better read performance?*
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Girish.
> > > > >
> > > >
> > >
> >
>

Re: Slow reads coinciding with higher compaction time avg time

Posted by Yu Li <ca...@gmail.com>.

Second Vlad. Swapping is necessary in many situations, but as JVM does not
behave well under swapping, HBase may run into trouble if swapped. Search
for "swapping" in hbase book <http://hbase.apache.org/book.html> and you
could see the same suggestion.

Best Regards,
Yu

On 3 November 2015 at 11:59, Vladimir Rodionov <vl...@gmail.com>
wrote:

> >> Do you have any specific suggestions to avoid swapping during hbase
> compactions.
>
> You can google "disable swap on linux", make sure that you do
> not overprovision your system's RAM (too many processes running and
> consuming all physical memory), monitor swap usage with vmstat.
>
> -Vlad
>
> On Mon, Nov 2, 2015 at 4:20 PM, Girish Joshi <gj...@groupon.com.invalid>
> wrote:
>
> > Thanks. Do you have any specific suggestions to avoid swapping during
> hbase
> > compactions.
> >
> > Thanks,
> >
> > Girish.
> >
> > On Sun, Nov 1, 2015 at 6:25 PM, Vladimir Rodionov <
> vladrodionov@gmail.com>
> > wrote:
> >
> > > >>- There is a spike in compaction time avg time metric. At the same
> time
> > > the
> > > >>swap bytes in and swap bytes out also have higher value.
> > >
> > > Swapping is bad. You have to avoid it.
> > >
> > > -Vlad
> > >
> > > On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi
> <gjoshi@groupon.com.invalid
> > >
> > > wrote:
> > >
> > > > Hello
> > > >
> > > > In my hbase cluster, I observe the following consistently happening
> > over
> > > > several days:-
> > > >
> > > > - There is a spike in compaction time avg time metric. At the same
> time
> > > the
> > > > swap bytes in and swap bytes out also have higher value.
> > > > - Around the same time, I see the FS PRead and FS Read latencies and
> > > client
> > > > latencies doing random reads increase.
> > > >
> > > > My hbase cluster consisting of 16 nodes and setup with a replication
> to
> > > > another cluster of 16 nodes has the following workload:-
> > > >
> > > > - There are around 4 tables which have lot of write activity(around
> > 500k
> > > > per second writes on m1/m15 moving average). 2 of these tables have
> > > atomic
> > > > counter columns keeping track of some analytics data and being
> > > incremented
> > > > with every write.
> > > >
> > > > - There are 2 tables which receive bulk uploaded data
> > periodically(around
> > > > once a day)
> > > >
> > > > - We expect reads at around 100k per second mainly from tables which
> > have
> > > > bulk upload data and the one which has counter columns. The read
> > > > latencies(p99) spike up to around 1000-5000 ms when the above
> > compaction
> > > > time avg time metric increases. In other times, they are below 100
> ms.
> > > >
> > > > I have set the hbase.hregion.majorcompaction to 0 on region servers;
> I
> > > plan
> > > > to set it to 0 on master nodes too so that I can take out the
> > possibility
> > > > of time triggered major compactions being the problem. But I suspect
> > > there
> > > > are lot of minor compactions and those leading to major compactions
> > > > happening at the time of spikes.
> > > >
> > > > *Any suggestions on how to avoid this situation of read latency
> spikes
> > > and
> > > > have better read performance?*
> > > >
> > > > Thanks,
> > > >
> > > > Girish.
> > > >
> > >
> >
>

Re: Slow reads coinciding with higher compaction time avg time

Posted by Vladimir Rodionov <vl...@gmail.com>.

>> Do you have any specific suggestions to avoid swapping during hbase
compactions.

You can google "disable swap on linux", make sure that you do
not overprovision your system's RAM (too many processes running and
consuming all physical memory), monitor swap usage with vmstat.

-Vlad

On Mon, Nov 2, 2015 at 4:20 PM, Girish Joshi <gj...@groupon.com.invalid>
wrote:

> Thanks. Do you have any specific suggestions to avoid swapping during hbase
> compactions.
>
> Thanks,
>
> Girish.
>
> On Sun, Nov 1, 2015 at 6:25 PM, Vladimir Rodionov <vl...@gmail.com>
> wrote:
>
> > >>- There is a spike in compaction time avg time metric. At the same time
> > the
> > >>swap bytes in and swap bytes out also have higher value.
> >
> > Swapping is bad. You have to avoid it.
> >
> > -Vlad
> >
> > On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi <gjoshi@groupon.com.invalid
> >
> > wrote:
> >
> > > Hello
> > >
> > > In my hbase cluster, I observe the following consistently happening
> over
> > > several days:-
> > >
> > > - There is a spike in compaction time avg time metric. At the same time
> > the
> > > swap bytes in and swap bytes out also have higher value.
> > > - Around the same time, I see the FS PRead and FS Read latencies and
> > client
> > > latencies doing random reads increase.
> > >
> > > My hbase cluster consisting of 16 nodes and setup with a replication to
> > > another cluster of 16 nodes has the following workload:-
> > >
> > > - There are around 4 tables which have lot of write activity(around
> 500k
> > > per second writes on m1/m15 moving average). 2 of these tables have
> > atomic
> > > counter columns keeping track of some analytics data and being
> > incremented
> > > with every write.
> > >
> > > - There are 2 tables which receive bulk uploaded data
> periodically(around
> > > once a day)
> > >
> > > - We expect reads at around 100k per second mainly from tables which
> have
> > > bulk upload data and the one which has counter columns. The read
> > > latencies(p99) spike up to around 1000-5000 ms when the above
> compaction
> > > time avg time metric increases. In other times, they are below 100 ms.
> > >
> > > I have set the hbase.hregion.majorcompaction to 0 on region servers; I
> > plan
> > > to set it to 0 on master nodes too so that I can take out the
> possibility
> > > of time triggered major compactions being the problem. But I suspect
> > there
> > > are lot of minor compactions and those leading to major compactions
> > > happening at the time of spikes.
> > >
> > > *Any suggestions on how to avoid this situation of read latency spikes
> > and
> > > have better read performance?*
> > >
> > > Thanks,
> > >
> > > Girish.
> > >
> >
>

Re: Slow reads coinciding with higher compaction time avg time

Posted by Girish Joshi <gj...@groupon.com.INVALID>.

Thanks. Do you have any specific suggestions to avoid swapping during hbase
compactions.

Thanks,

Girish.

On Sun, Nov 1, 2015 at 6:25 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> >>- There is a spike in compaction time avg time metric. At the same time
> the
> >>swap bytes in and swap bytes out also have higher value.
>
> Swapping is bad. You have to avoid it.
>
> -Vlad
>
> On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi <gj...@groupon.com.invalid>
> wrote:
>
> > Hello
> >
> > In my hbase cluster, I observe the following consistently happening over
> > several days:-
> >
> > - There is a spike in compaction time avg time metric. At the same time
> the
> > swap bytes in and swap bytes out also have higher value.
> > - Around the same time, I see the FS PRead and FS Read latencies and
> client
> > latencies doing random reads increase.
> >
> > My hbase cluster consisting of 16 nodes and setup with a replication to
> > another cluster of 16 nodes has the following workload:-
> >
> > - There are around 4 tables which have lot of write activity(around 500k
> > per second writes on m1/m15 moving average). 2 of these tables have
> atomic
> > counter columns keeping track of some analytics data and being
> incremented
> > with every write.
> >
> > - There are 2 tables which receive bulk uploaded data periodically(around
> > once a day)
> >
> > - We expect reads at around 100k per second mainly from tables which have
> > bulk upload data and the one which has counter columns. The read
> > latencies(p99) spike up to around 1000-5000 ms when the above compaction
> > time avg time metric increases. In other times, they are below 100 ms.
> >
> > I have set the hbase.hregion.majorcompaction to 0 on region servers; I
> plan
> > to set it to 0 on master nodes too so that I can take out the possibility
> > of time triggered major compactions being the problem. But I suspect
> there
> > are lot of minor compactions and those leading to major compactions
> > happening at the time of spikes.
> >
> > *Any suggestions on how to avoid this situation of read latency spikes
> and
> > have better read performance?*
> >
> > Thanks,
> >
> > Girish.
> >
>

Re: Slow reads coinciding with higher compaction time avg time

Posted by Vladimir Rodionov <vl...@gmail.com>.

>>- There is a spike in compaction time avg time metric. At the same time
the
>>swap bytes in and swap bytes out also have higher value.

Swapping is bad. You have to avoid it.

-Vlad

On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi <gj...@groupon.com.invalid>
wrote:

> Hello
>
> In my hbase cluster, I observe the following consistently happening over
> several days:-
>
> - There is a spike in compaction time avg time metric. At the same time the
> swap bytes in and swap bytes out also have higher value.
> - Around the same time, I see the FS PRead and FS Read latencies and client
> latencies doing random reads increase.
>
> My hbase cluster consisting of 16 nodes and setup with a replication to
> another cluster of 16 nodes has the following workload:-
>
> - There are around 4 tables which have lot of write activity(around 500k
> per second writes on m1/m15 moving average). 2 of these tables have atomic
> counter columns keeping track of some analytics data and being incremented
> with every write.
>
> - There are 2 tables which receive bulk uploaded data periodically(around
> once a day)
>
> - We expect reads at around 100k per second mainly from tables which have
> bulk upload data and the one which has counter columns. The read
> latencies(p99) spike up to around 1000-5000 ms when the above compaction
> time avg time metric increases. In other times, they are below 100 ms.
>
> I have set the hbase.hregion.majorcompaction to 0 on region servers; I plan
> to set it to 0 on master nodes too so that I can take out the possibility
> of time triggered major compactions being the problem. But I suspect there
> are lot of minor compactions and those leading to major compactions
> happening at the time of spikes.
>
> *Any suggestions on how to avoid this situation of read latency spikes and
> have better read performance?*
>
> Thanks,
>
> Girish.
>