You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2009/02/11 19:56:59 UTC

[jira] Created: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

IPC of large cells should transfer in chunks not via naive full copy
--------------------------------------------------------------------

                 Key: HBASE-1197
                 URL: https://issues.apache.org/jira/browse/HBASE-1197
             Project: Hadoop HBase
          Issue Type: Sub-task
            Reporter: Andrew Purtell


Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by Ryan Rawson <ry...@gmail.com>.
With HFile there should ideally be only 1 copy of the key/value in memory,
and whatever mechanism can chunk off the ByteBuffer without overly
duplicating data.  This might help with this problem some.

On Wed, Feb 11, 2009 at 7:11 PM, Andrew Purtell (JIRA) <ji...@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Andrew Purtell updated HBASE-1197:
> ----------------------------------
>
>     Comment: was deleted
>
> > IPC of large cells should transfer in chunks not via naive full copy
> > --------------------------------------------------------------------
> >
> >                 Key: HBASE-1197
> >                 URL: https://issues.apache.org/jira/browse/HBASE-1197
> >             Project: Hadoop HBase
> >          Issue Type: Sub-task
> >            Reporter: Andrew Purtell
> >            Assignee: Andrew Purtell
> >             Fix For: 0.20.0
> >
> >
> > Several instances of OOME when trying to serve up large cells to clients
> have been observed. IPC should send large cell content in chunks instead of
> as one large naive copy.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by Ryan Rawson <ry...@gmail.com>.
I doubt we could chunk values straight into HFile - you'd have to have 1
file for 1 value.  If your value is that large (more than hundreds of megs),
maybe you shouldn't be storing that in HBase - store directly in HDFS and
use HBase to index the content and provide filename pointers.

As it stands, right now a key/value has to live in memcache for some period
of time (seconds? minutes?), so storing an entire key/value has to be
feasable.  Not supporting chunking/streaming doesn't seem to be a major
deficiency.

I think of hbase as a way of making it possible to effiency store smallish
values on HDFS.  I think we should support reasonably large values, but
right now there is a 2gb value max size (int size for value).  With enough
RAM thrown at HBase it should be possible to support nearly all of that
size.

On Wed, Feb 11, 2009 at 9:51 PM, stack (JIRA) <ji...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672889#action_12672889]
>
> stack commented on HBASE-1197:
> ------------------------------
>
> Its almost as though we should chunk end-to-end; from hfile all the ways on
> out to the client.
>
> Tough one is on the way in.  The big cell is going to have to sit in the
> mecache for some -- probably small if its a big cell value -- period of
> time.  Can we chunk on the way in?  From client to hfile?
>
>
>
> > IPC of large cells should transfer in chunks not via naive full copy
> > --------------------------------------------------------------------
> >
> >                 Key: HBASE-1197
> >                 URL: https://issues.apache.org/jira/browse/HBASE-1197
> >             Project: Hadoop HBase
> >          Issue Type: Improvement
> >            Reporter: Andrew Purtell
> >             Fix For: 0.20.0
> >
> >
> > Several instances of OOME when trying to serve up large cells to clients
> have been observed. IPC should send large cell content in chunks instead of
> as one large naive copy.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Commented: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672889#action_12672889 ] 

stack commented on HBASE-1197:
------------------------------

Its almost as though we should chunk end-to-end; from hfile all the ways on out to the client.

Tough one is on the way in.  The big cell is going to have to sit in the mecache for some -- probably small if its a big cell value -- period of time.  Can we chunk on the way in?  From client to hfile?



> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672922#action_12672922 ] 

Andrew Purtell commented on HBASE-1197:
---------------------------------------

I wonder if storing Web content into HBase isn't a canonical use case. Certainly the Heritrix hbase-writer makes it easy enough. The example Bigtable schema on the HBase Architecture page implies a use case of storing views of Web data snapshotted over time. Podcasts and large software packages can exceed 100MB. They are outliers certainly but with the current scheme where they are copied multiple times on the heap it is not only inefficient but also occasionally fatal via OOME.

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1197:
----------------------------------

    Issue Type: Sub-task  (was: Improvement)
        Parent: HBASE-1024

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1197:
----------------------------------

    Issue Type: Improvement  (was: Sub-task)
        Parent:     (was: HBASE-1196)

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-1197:
-------------------------------------

    Assignee:     (was: Andrew Purtell)

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1197:
----------------------------------

    Comment: was deleted

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672920#action_12672920 ] 

Andrew Purtell commented on HBASE-1197:
---------------------------------------

From: Ryan Rawson
To: hbase-dev@hadoop.apache.org

I doubt we could chunk values straight into HFile - you'd have to have 1file for 1 value.  If your value is that large (more than hundreds of megs), maybe you shouldn't be storing that in HBase - store directly in HDFS and use HBase to index the content and provide filename pointers.

As it stands, right now a key/value has to live in memcache for some period of time (seconds? minutes?), so storing an entire key/value has to be feasable.  Not supporting chunking/streaming doesn't seem to be a major deficiency.

I think of hbase as a way of making it possible to effiency store smallish values on HDFS.  I think we should support reasonably large values, but right now there is a 2gb value max size (int size for value).  With enough RAM thrown at HBase it should be possible to support nearly all of that size.

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1197:
----------------------------------

    Fix Version/s: 0.20.0
         Assignee: Andrew Purtell

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1197:
----------------------------------

    Fix Version/s:     (was: 0.20.0)

+1 

Tabled.

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672722#action_12672722 ] 

Andrew Purtell commented on HBASE-1197:
---------------------------------------

My thinking on how to handle this relatively quickly is to special case the writing of large byte[] in HbaseObjectWritable as a sequence of RPCs of chunks of it. 

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1197:
----------------------------------

    Issue Type: Improvement  (was: Sub-task)
        Parent:     (was: HBASE-1024)

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1197) IPC of large cells should transfer in chunks not via naive full copy

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701642#action_12701642 ] 

stack commented on HBASE-1197:
------------------------------

Shall we move this out of 0.20.0?

> IPC of large cells should transfer in chunks not via naive full copy
> --------------------------------------------------------------------
>
>                 Key: HBASE-1197
>                 URL: https://issues.apache.org/jira/browse/HBASE-1197
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> Several instances of OOME when trying to serve up large cells to clients have been observed. IPC should send large cell content in chunks instead of as one large naive copy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.