You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Doug Meil (JIRA)" <ji...@apache.org> on 2011/07/27 14:11:09 UTC

[jira] [Created] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

HTable.doPut(List) should check the writebuffer lengh every so often
--------------------------------------------------------------------

                 Key: HBASE-4143
                 URL: https://issues.apache.org/jira/browse/HBASE-4143
             Project: HBase
          Issue Type: Improvement
            Reporter: Doug Meil
            Priority: Minor



This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.

 public void put(final List<Put> puts) throws IOException {
    doPut(puts);
  }
  private void doPut(final List<Put> puts) throws IOException {
    for (Put put : puts) {
      validatePut(put);
      writeBuffer.add(put);
      currentWriteBufferSize += put.heapSize();
    }
    if (autoFlush || currentWriteBufferSize > writeBufferSize) {
      flushCommits();
    }
  }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4143:
--------------------------

    Summary: HTable.doPut(List) should check the writebuffer length every so often  (was: HTable.doPut(List) should check the writebuffer lengh every so often)

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072119#comment-13072119 ] 

Ted Yu commented on HBASE-4143:
-------------------------------

Integrated to branch and TRUNK.

Thanks for the patch Doug.
Thanks for the review Andrew.

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072622#comment-13072622 ] 

Gary Helmling commented on HBASE-4143:
--------------------------------------

{quote}
If single Put isn't batched, I wonder why the user would want to batch list of {@link Put}s.
{quote}

The current semantics are: if autoflush is enabled, writes are automatically flushed for you at the end of put(List<Put>); if disabled, then writes are buffered until the write buffer is full.

I don't really want to change that as part of this issue.  As I see it the intent here is just to respect the write buffer sizing when batching.  I just want to fix an anti-pattern we introduced from an implementation detail of the original fix.

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE-4143_update.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-4143:
-----------------------------

    Attachment: client_HBASE_4143.patch

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072003#comment-13072003 ] 

Doug Meil commented on HBASE-4143:
----------------------------------

I had this vision of people at some point wanting this check to be configurable per HTable, but I think it's better to start with a constant.

Changed.

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072048#comment-13072048 ] 

Andrew Purtell commented on HBASE-4143:
---------------------------------------

+1 on second patch.

I made just a documentation change because I wasn't convinced that really large RPCs didn't have a place, but if the consensus is to change put(List<Put>) that makes sense to me. 

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil reassigned HBASE-4143:
--------------------------------

    Assignee: Doug Meil

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4143:
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072613#comment-13072613 ] 

Ted Yu commented on HBASE-4143:
-------------------------------

Looking at javadoc for flushCommits():
{code}
   * This method gets called once automatically for every {@link Put} or batch
   * of {@link Put}s (when <code>put(List<Put>)</code> is used) when
   * {@link #isAutoFlush} is {@code true}.
{code}
If single Put isn't batched, I wonder why the user would want to batch list of {@link Put}s.

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072771#comment-13072771 ] 

Doug Meil commented on HBASE-4143:
----------------------------------

re:  "This effectively disables the ability to do batching."

There is already a client method called 'batch'.  I think that should be encouraged to be the preferred batch method if callers want a "do exactly what I say" approach.  Otherwise, put(Put) and put(List) should obey the writeBuffer rules.  I'm cool with the patch though.



> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE-4143_update.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling reopened HBASE-4143:
----------------------------------


The fix committed for this issue unfortunately causes a new performance problem when autoflush==true.

In this case, the effect is to call flushCommits() every 10 items in the List<Put>.  This effectively disables the ability to do batching.

For the check internal to the for loop over List<Put>, we should only be checking if currentWriteBufferSize > writeBufferSize.  The check on autoflush should only come after the for loop.

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071994#comment-13071994 ] 

Ted Yu commented on HBASE-4143:
-------------------------------

{code}
+  private int doPutWbCheck = 10;    // i.e., doPut checks the writebuffer every X Puts.
{code}
The above should be declared static final and spelled in upper-case.

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071982#comment-13071982 ] 

Doug Meil commented on HBASE-4143:
----------------------------------

Changed HTable.doPut for periodic check of in-loop writebuffer length, and javadoc comment in HTableIterface.

I ran TestHTableUtil locally because that uses the put(List) method, and the tests passed.

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-4143:
-----------------------------

    Description: 
This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.

 public void put(final List<Put> puts) throws IOException {
    doPut(puts);
  }
  private void doPut(final List<Put> puts) throws IOException {
    for (Put put : puts) {
      validatePut(put);
      writeBuffer.add(put);
      currentWriteBufferSize += put.heapSize();
    }
    if (autoFlush || currentWriteBufferSize > writeBufferSize) {
      flushCommits();
    }
  }

Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

  was:

This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.

 public void put(final List<Put> puts) throws IOException {
    doPut(puts);
  }
  private void doPut(final List<Put> puts) throws IOException {
    for (Put put : puts) {
      validatePut(put);
      writeBuffer.add(put);
      currentWriteBufferSize += put.heapSize();
    }
    if (autoFlush || currentWriteBufferSize > writeBufferSize) {
      flushCommits();
    }
  }


> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-4143:
---------------------------------

    Attachment: HBASE-4143_update.patch

Update to internal check so that we're only checking against the write buffer and so we continue checking if the first check is false (previously "n" would not get reset and subsequent checks would fail if the first write buffer check was negative).

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE-4143_update.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072008#comment-13072008 ] 

Ted Yu commented on HBASE-4143:
-------------------------------

+1 on second patch.

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu resolved HBASE-4143.
---------------------------

    Resolution: Fixed

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE-4143_update.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-4143:
-----------------------------

    Attachment: client_HBASE_4143.patch

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-4143:
-----------------------------

    Status: Patch Available  (was: Open)

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-4143:
-----------------------------

    Attachment:     (was: client_HBASE_4143.patch)

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072623#comment-13072623 ] 

Ted Yu commented on HBASE-4143:
-------------------------------

I applied Gary's patch to branch and TRUNK.
Looking at the check:
{code}
      if (n % DOPUT_WB_CHECK == 0 && currentWriteBufferSize > writeBufferSize) {
{code}
the effect is that if write buffer size is reached at Put #11, we would wait till Put #20 (suppose there're at least 20 Put's) to flush.
I guess it is Okay since we may have closen 20 as DOPUT_WB_CHECK in the beginning.

I will wait for a day before resolving this JIRA.

Thanks for Gary's sharp eyes and sharp mind.

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE-4143_update.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072618#comment-13072618 ] 

Andrew Purtell commented on HBASE-4143:
---------------------------------------

+1 on the update patch. Thanks for spotting the actual bug. Originally I just wanted a documentation update so the batching behavior (and large RPCs if the user wishes) could be preserved, so I'm happy with that change as well.

> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE-4143_update.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer length every so often

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072652#comment-13072652 ] 

Hudson commented on HBASE-4143:
-------------------------------

Integrated in HBase-TRUNK #2061 (See [https://builds.apache.org/job/HBase-TRUNK/2061/])
    HBASE-4143 HTable.doPut(List) should check the writebuffer length every so often
           addendum by Gary H

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java


> HTable.doPut(List) should check the writebuffer length every so often
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: HBASE-4143_update.patch, client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4143) HTable.doPut(List) should check the writebuffer lengh every so often

Posted by "Doug Meil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072065#comment-13072065 ] 

Doug Meil commented on HBASE-4143:
----------------------------------

Andy, I'm glad you made that doc-change because it exposed an edge-case for that method! Thanks.

> HTable.doPut(List) should check the writebuffer lengh every so often
> --------------------------------------------------------------------
>
>                 Key: HBASE-4143
>                 URL: https://issues.apache.org/jira/browse/HBASE-4143
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: client_HBASE_4143.patch
>
>
> This came up on a dist-list conversation between Andy P., Ted Yu, and myself.  Andy noted that extremely large lists passed into put(List) can cause issues.  Ted suggested that having doPut check the write-buffer length every so often (5-10 records?) so the flush doesn't happen only at the end, and I think that's good idea.
>  public void put(final List<Put> puts) throws IOException {
>     doPut(puts);
>   }
>   private void doPut(final List<Put> puts) throws IOException {
>     for (Put put : puts) {
>       validatePut(put);
>       writeBuffer.add(put);
>       currentWriteBufferSize += put.heapSize();
>     }
>     if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>       flushCommits();
>     }
>   }
> Once this change is made, remove the comment in HBASE-4142 about large lists being a performance problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira