You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jacques (JIRA)" <ji...@apache.org> on 2012/05/12 19:10:48 UTC

[jira] [Created] (HBASE-5993) Add a no-read Append

Jacques created HBASE-5993:
------------------------------

             Summary: Add a no-read Append
                 Key: HBASE-5993
                 URL: https://issues.apache.org/jira/browse/HBASE-5993
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 0.94.0
            Reporter: Jacques


HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.

Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Jacques (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285800#comment-13285800 ] 

Jacques commented on HBASE-5993:
--------------------------------

The implementation of HBASE-4218, HBASE-4676 and HBASE-6093 reduce the storage overhead of the multicolumn approach to solving this problem.  Network bandwidth and processing overhead will still exist.  Using encoding schemes to solve this problem is nice because the changes are constrained as opposed to cross-cutting.  That being said, it seems a bit like boiling the ocean to make a cup of tea.  Let me put a design doc together and then we can reevaluate.  My intuition is that this type of functionality could open up a new set of use cases for HBase.  
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283144#comment-13283144 ] 

Lars Hofhansl commented on HBASE-5993:
--------------------------------------

Then I do not understand what we are proposing here. An Append that does not read the existing value is a Put, no?

Maybe a patch will make it clear to me.
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279281#comment-13279281 ] 

stack commented on HBASE-5993:
------------------------------

@Lars Yeah, in the server, a new cell would be made of the current value plus that proffered in the incoming append
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285493#comment-13285493 ] 

Lars Hofhansl commented on HBASE-5993:
--------------------------------------

Honestly, I still do not understand what Jacques is proposing. In order to append to something you'd have to read that something first. HBase has no in place updates (for a good reason). So one could:
# Replace the KV if it is still in the memstore.
# Store incremental changes (somewhere?) and combine upon read from HBase.

                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Jacques (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285764#comment-13285764 ] 

Jacques commented on HBASE-5993:
--------------------------------

The reason this can make sense is data overhead.  In a case where we are capturing a large number of small values, the KeyValue overhead is substantial.  The original use case is one where I'm adding to a list of documents that contain a certain term (search index).  Let's say that each document number is a four byte int.  Right now there are two options: use the existing append which means one will become swamped with reads as the cell value grows over time (this would also wreak havoc on memstore flushes as the cell value become megabytes in size and we're just adding another four bytes once a day).  On the flipside, using separate columns creates a substantial amount of overhead for each value added.  This utility of this functionality also extends to situations where people are capturing a large sequence of small values in monitoring applications.  (Sematext are basically trying to create this functionality already with their HBaseHUT work.)  

Yes, an additional KeyValue.Type is needed.  When this type is read, the return functionality goes and get all the appended values (and the last full value) and then combines them on return.  As compactions are done, the complete merged values are created.  

I'm swamped right now but am going to try to write up a short design doc in the next couple of weeks and get you guys to review my approach since this will have to touch a number of components.  I also need to make sure to manage edge cases like what happens if you do a no-read append and no existing value exists (probably ok--even though read back performance will be poor).  


                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285129#comment-13285129 ] 

Lars Hofhansl commented on HBASE-5993:
--------------------------------------

Wanna make a patch Jacques?
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Jacques (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284074#comment-13284074 ] 

Jacques commented on HBASE-5993:
--------------------------------

Exactly. If you have five megs of values in a cell and then want to append another few bytes regularly, it would be best if HBase didn't have to read the existing value every time we wanted to add a few more bytes.  Using multiple columns to psuedo accomplish this functionality creates a lot of data overhead.
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285335#comment-13285335 ] 

Jieshan Bean commented on HBASE-5993:
-------------------------------------

I'm not against this new feature, but anyway I don't think it's a good idea to use this append API frequently w.r.t. one KeyValue. If one KeyValue need to updated regularly, why not consdering a better key-schema? 
e.g. 
   key-1,  value
   key-2,  append-value1
   key-3,  append-value2
   ..........
then we can combine the results from client side.
Back to this improvement, the only advantage is combining the results from server side is better than from client side. But we need to change the read/scan logic. One seek may come together with another serveral seeks in order to get the existing value and appended values,...
Maybe my understanding is not correct:)

                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285504#comment-13285504 ] 

Jieshan Bean commented on HBASE-5993:
-------------------------------------

One KeyValue will include 2 parts: The original value, and the appended value. So if we append new value, just add this value into the "additional" part(Maybe we can implement this by introducing a new KeyValue.Type).  So we need to combine the values. (@Jacques: Please correct me if I misunderstood anything).
I'm wondering whether it is really necessary.
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283150#comment-13283150 ] 

Jieshan Bean commented on HBASE-5993:
-------------------------------------

So combine the existing value and append value during reading? it was my understanding:)
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5993) Add a no-read Append

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5993:
-------------------------

    Priority: Critical  (was: Major)

Others have asked for this.  Making critical so it gets consideration.
                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5993) Add a no-read Append

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274454#comment-13274454 ] 

Lars Hofhansl commented on HBASE-5993:
--------------------------------------

I might be a bit dense, but how would this work?

Internally HBase always needs to read the existing value in order to append to it, as nothing is changed in place (i.e. HBase has to generate a new KV for the new value). 

                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value.  This would be useful in building a growing set of values.  Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search.  However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira