You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raju Bairishetti (JIRA)" <ji...@apache.org> on 2014/10/15 12:15:34 UTC

[jira] [Created] (HADOOP-11205) ThrottledInputStream should return the actual bandwidth (read rate)

Raju Bairishetti created HADOOP-11205:
-----------------------------------------

             Summary: ThrottledInputStream should return the actual bandwidth (read rate)
                 Key: HADOOP-11205
                 URL: https://issues.apache.org/jira/browse/HADOOP-11205
             Project: Hadoop Common
          Issue Type: Bug
          Components: tools/distcp
            Reporter: Raju Bairishetti
            Assignee: Raju Bairishetti


Currently, it is not returning the actual read rate. Due to this, most of the time is in idle state.

Behavior: First, it checks whether current bandwidth (number of bytes per second) is more than maxBandwidth before reading a chunk of bytes(or byte) from buffer. If read rate exceeds max bandwidth then it sleeps for 50ms and resume the process after the sleeping period(50ms).

Ex: Assume, both maxBandwidth = 1MBPS and read rate = 1MBPS(i.e. reading 1M messages per second())

In the above case,  even if it reads 1.5MB in 1.5 sec which is ideally not crossing the max bandwidth but still it goes for sleeping mode as it assumes read rate is 1.5M (bytes read/ time i.e. 1.5/1.. time is 1500ms/1000 =1) instead of 1(i.e. 1.5/1.5).

Example: 
It does not got to sleep mode till 1 sec as number of bytes read in that elapsed time is lesser than maxBandwidth.
when it reads 1M +1 byte/chunk it checks read rate against maxBandwidth. 
when it reads 1M + 2byte /chunk it sleeps for 50ms as read rate is > 1
when it reads 1M + 3byte/chunk again it sleeps for 50ms as read rate is > 1
...
even if it reads 1.5MB in 1.5 sec but still it goes for sleeping mode as it assumes read rate is 1.5M (bytes read/ time i.e. 1.5/1.. time is 1500ms/1000 =1) instead of 1(i.e. 1.5/1.5).

Cons: it reads for a sec and almost sleeps for a 1sec in an alternate fashion.

getBytesPerSec() method is not returning the actual bandwidth.
Current code: {code}
public long getBytesPerSec() {
    long elapsed = (System.currentTimeMillis() - startTime) / 1000;
    if (elapsed == 0) {
      return bytesRead;
    } else {
      return bytesRead / elapsed;
    }
  }
{code}
We should fix the getBytesPerSec() method:
{code}
public long getBytesPerSec() {
    long elapsedTimeInMilliSecs = System.currentTimeMillis() - startTime;
    if (elapsedTimeInMilliSecs <= MILLISECONDS_IN_SEC) {
      return bytesRead;
    } else {
      return (bytesRead * MILLISECONDS_IN_SEC)/ elapsedTimeInMilliSecs;
    }
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)