You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Tobi Vollebregt (JIRA)" <ji...@apache.org> on 2014/12/04 03:51:12 UTC

[jira] [Updated] (HBASE-12632) ThrottledInputStream/ExportSnapshot does not throttle

     [ https://issues.apache.org/jira/browse/HBASE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tobi Vollebregt updated HBASE-12632:
------------------------------------
    Description: 
I just transferred a ton of data using ExportSnapshot with bandwidth throttling from one Hadoop cluster to another Hadoop cluster, and discovered that ThrottledInputStream does not limit bandwidth.

The problem is that ThrottledInputStream sleeps once, for a fixed time (50 ms), at the start of each read call, disregarding the actual amount of data read.

ExportSnapshot defaults to a buffer size as big as the block size of the outputFs:

{code:java}
      // Use the default block size of the outputFs if bigger
      int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), BUFFER_SIZE);
      bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
      LOG.info("Using bufferSize=" + StringUtils.humanReadableInt(bufferSize));
{code}

In my case, this was 256MB.

Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, each time sleeping only 50ms. Thus, in the worst case where each call to read fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.

Even in a more realistic case where read returns about 1 MB per call, it still cannot throttle the bandwidth to under 20 MB/s.

The issue is exacerbated by the fact that you need to set a low limit because the total bandwidth per host depends on the number of mapper slots as well.

A simple solution would change the if in throttle to a while, so that it keeps sleeping for 50 ms until the rate is finally low enough:

{code:java}
  private void throttle() throws IOException {
    while (getBytesPerSec() > maxBytesPerSec) {
      try {
        Thread.sleep(SLEEP_DURATION_MS);
        totalSleepTime += SLEEP_DURATION_MS;
      } catch (InterruptedException e) {
        throw new IOException("Thread aborted", e);
      }
    }
  }
{code}

This issue affects the ThrottledInputStream in hadoop as well.

Another way to see this is that for big enough buffer sizes, ThrottledInputStream will be throttling only the number of read calls to 20 per second, disregarding the number of bytes read. 

  was:
I just transferred a ton of data using ExportSnapshot with bandwidth throttling from one Hadoop cluster to another Hadoop cluster, and discovered that ThrottledInputStream does not limit bandwidth.

The problem is that ThrottledInputStream sleeps once, for a fixed time (50 ms), at the start of each read call, disregarding the actual amount of data read.

ExportSnapshot defaults to a buffer size as big as the block size of the outputFs:

{code:java}
      // Use the default block size of the outputFs if bigger
      int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), BUFFER_SIZE);
      bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
      LOG.info("Using bufferSize=" + StringUtils.humanReadableInt(bufferSize));
{code}

In my case, this was 256MB.

Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, each time sleeping only 50ms. Thus, in the worst case where each call to read fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.

Even in a more realistic case where read returns about 1 MB per call, it still cannot throttle the bandwidth to under 20 MB/s.

The issue is exacerbated by the fact that you need to set a low limit because the total bandwidth per host depends on the number of mapper slots as well.

A simple solution would change the if in throttle to a while, so that it keeps sleeping for 50 ms until the rate is finally low enough:

{code:java}
  private void throttle() throws IOException {
    while (getBytesPerSec() > maxBytesPerSec) {
      try {
        Thread.sleep(SLEEP_DURATION_MS);
        totalSleepTime += SLEEP_DURATION_MS;
      } catch (InterruptedException e) {
        throw new IOException("Thread aborted", e);
      }
    }
  }
{code}

This issue affects the ThrottledInputStream in hadoop as well.


> ThrottledInputStream/ExportSnapshot does not throttle
> -----------------------------------------------------
>
>                 Key: HBASE-12632
>                 URL: https://issues.apache.org/jira/browse/HBASE-12632
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.99.2
>            Reporter: Tobi Vollebregt
>
> I just transferred a ton of data using ExportSnapshot with bandwidth throttling from one Hadoop cluster to another Hadoop cluster, and discovered that ThrottledInputStream does not limit bandwidth.
> The problem is that ThrottledInputStream sleeps once, for a fixed time (50 ms), at the start of each read call, disregarding the actual amount of data read.
> ExportSnapshot defaults to a buffer size as big as the block size of the outputFs:
> {code:java}
>       // Use the default block size of the outputFs if bigger
>       int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), BUFFER_SIZE);
>       bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
>       LOG.info("Using bufferSize=" + StringUtils.humanReadableInt(bufferSize));
> {code}
> In my case, this was 256MB.
> Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, each time sleeping only 50ms. Thus, in the worst case where each call to read fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.
> Even in a more realistic case where read returns about 1 MB per call, it still cannot throttle the bandwidth to under 20 MB/s.
> The issue is exacerbated by the fact that you need to set a low limit because the total bandwidth per host depends on the number of mapper slots as well.
> A simple solution would change the if in throttle to a while, so that it keeps sleeping for 50 ms until the rate is finally low enough:
> {code:java}
>   private void throttle() throws IOException {
>     while (getBytesPerSec() > maxBytesPerSec) {
>       try {
>         Thread.sleep(SLEEP_DURATION_MS);
>         totalSleepTime += SLEEP_DURATION_MS;
>       } catch (InterruptedException e) {
>         throw new IOException("Thread aborted", e);
>       }
>     }
>   }
> {code}
> This issue affects the ThrottledInputStream in hadoop as well.
> Another way to see this is that for big enough buffer sizes, ThrottledInputStream will be throttling only the number of read calls to 20 per second, disregarding the number of bytes read. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)