You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Kay Kay (JIRA)" <ji...@apache.org> on 2010/02/10 02:56:28 UTC

[jira] Created: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
------------------------------------------------------------------------------------

                 Key: HBASE-2208
                 URL: https://issues.apache.org/jira/browse/HBASE-2208
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: Kay Kay


With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 

With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 

May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?

{code}
Batch b = new Batch(this) {
        @Override
        int doCall(final List<Row> currentList, final byte [] row,
          final byte [] tableName)
        throws IOException, RuntimeException {
          *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
          return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
              tableName, row) {
            public Integer call() throws IOException {
              return server.put(location.getRegionInfo().getRegionName(), puts);
            }
          });
        }
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831853#action_12831853 ] 

stack commented on HBASE-2208:
------------------------------

No, just that you'll be changing the interface.  To do that you need to up the rpc version.  Upping rpc version can only happen over in a major release, i.e. 0.21.0

> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "Kay Kay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831816#action_12831816 ] 

Kay Kay commented on HBASE-2208:
--------------------------------

Syntactic sugar for this patch apart , is due to fundamental limitation of not allowing List<T> across the wire.   HBASE-2209 tracks that separately. 


> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832635#action_12832635 ] 

stack commented on HBASE-2208:
------------------------------

FYI: hbase-2209 was committed.

> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>             Fix For: 0.21.0
>
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "Kay Kay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831857#action_12831857 ] 

Kay Kay commented on HBASE-2208:
--------------------------------

{quote}
Upping rpc version can only happen over in a major release, i.e. 0.21.0
{quote}

Oh Yes. I did not mean to be in a minor release since I am aware this would break wire protocol compatibility. Let me change the targeted version on the jira as well. 

> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>             Fix For: 0.21.0
>
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831842#action_12831842 ] 

stack commented on HBASE-2208:
------------------------------

That'd be fine in 0.21

> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "Kay Kay (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kay Kay updated HBASE-2208:
---------------------------

    Fix Version/s: 0.21.0

> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>             Fix For: 0.21.0
>
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "Kay Kay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832639#action_12832639 ] 

Kay Kay commented on HBASE-2208:
--------------------------------

Thanks stack for commiting hbase-2209. Will revisit this with the signature changes to prevent the copy. Sorry for the separate jira - me just gets nervous by big patches . 

> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>             Fix For: 0.21.0
>
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "Kay Kay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831849#action_12831849 ] 

Kay Kay commented on HBASE-2208:
--------------------------------

did you mean avro or s.th similar will replace the rpc as discussed by ryan in the irc ?
 
i was  looking at trunk 

> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2208) TableServers # processBatchOfRows - converts from List to [ ] - Expensive copy

Posted by "Kay Kay (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kay Kay updated HBASE-2208:
---------------------------


*not backward compatible* 


> TableServers #  processBatchOfRows   -  converts from List to [ ]  - Expensive copy 
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-2208
>                 URL: https://issues.apache.org/jira/browse/HBASE-2208
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Kay Kay
>             Fix For: 0.21.0
>
>
> With autoFlush to false and a large write buffer on HTable, when we write bulk puts -  TableServer # processBatchOfRows  , convert the input (List) to an [ ] , before sending down the wire. 
> With a write buffer as large as 20 MB , that becomes an expensive copy when we do   - list.toArray(new T[ ] ). 
> May be - should we change the wire protocol to support List as well , and then revisit this to prevent the bulk copy ?
> {code}
> Batch b = new Batch(this) {
>         @Override
>         int doCall(final List<Row> currentList, final byte [] row,
>           final byte [] tableName)
>         throws IOException, RuntimeException {
>           *final Put [] puts = currentList.toArray(PUT_ARRAY_TYPE);*
>           return getRegionServerWithRetries(new ServerCallable<Integer>(this.c,
>               tableName, row) {
>             public Integer call() throws IOException {
>               return server.put(location.getRegionInfo().getRegionName(), puts);
>             }
>           });
>         }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.