You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/07/01 01:40:42 UTC

Re: [jira] [Created] (ACCUMULO-665) large values, complex iterator stacks, and RFile readers can consume a surprising amount of memory

How would you define complex iterator stack? Can you outline the elements?
On Jun 29, 2012 5:19 PM, "Eric Newton (JIRA)" <ji...@apache.org> wrote:

> Eric Newton created ACCUMULO-665:
> ------------------------------------
>
>              Summary: large values, complex iterator stacks, and RFile
> readers can consume a surprising amount of memory
>                  Key: ACCUMULO-665
>                  URL: https://issues.apache.org/jira/browse/ACCUMULO-665
>              Project: Accumulo
>           Issue Type: Bug
>           Components: tserver
>     Affects Versions: 1.5.0, 1.4.0
>          Environment: large cluster
>             Reporter: Eric Newton
>             Assignee: Eric Newton
>             Priority: Minor
>
>
> On a production cluster, with a complex iterator tree, a large value
> (~350M) was causing a 4G tserver to fail with out-of-memory.
>
> There were several factors contributing to the problem:
> # a bug: the query should not have been looking to the big data
> # complex iterator tree, causing many copies of the data to be held at the
> same time
> # RFile doubles the buffer it uses to load values, and continues to use
> that large buffer for future values
>
> This ticket is for the last point.  If we know we're not even going to
> look at the value, we can read past it without storing it in memory.  It is
> surprising that skipping past a large value would cause the server to run
> out of memory, especially since it should fit into memory enough times to
> be returned to the caller.
>
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

Re: [jira] [Created] (ACCUMULO-665) large values, complex iterator stacks, and RFile readers can consume a surprising amount of memory

Posted by William Slacum <wi...@accumulo.net>.
He's referring to something like the BooleanLogic iterator stack in the
Wikipedia example. It's a tree of user iterators that are merging streams
of key-value pairs together, so you end up getting many open readers and
possibly many RFile blocks spread out among many HDFS blocks concurrently.

On Sat, Jun 30, 2012 at 7:40 PM, David Medinets <da...@gmail.com>wrote:

> How would you define complex iterator stack? Can you outline the elements?
> On Jun 29, 2012 5:19 PM, "Eric Newton (JIRA)" <ji...@apache.org> wrote:
>
> > Eric Newton created ACCUMULO-665:
> > ------------------------------------
> >
> >              Summary: large values, complex iterator stacks, and RFile
> > readers can consume a surprising amount of memory
> >                  Key: ACCUMULO-665
> >                  URL: https://issues.apache.org/jira/browse/ACCUMULO-665
> >              Project: Accumulo
> >           Issue Type: Bug
> >           Components: tserver
> >     Affects Versions: 1.5.0, 1.4.0
> >          Environment: large cluster
> >             Reporter: Eric Newton
> >             Assignee: Eric Newton
> >             Priority: Minor
> >
> >
> > On a production cluster, with a complex iterator tree, a large value
> > (~350M) was causing a 4G tserver to fail with out-of-memory.
> >
> > There were several factors contributing to the problem:
> > # a bug: the query should not have been looking to the big data
> > # complex iterator tree, causing many copies of the data to be held at
> the
> > same time
> > # RFile doubles the buffer it uses to load values, and continues to use
> > that large buffer for future values
> >
> > This ticket is for the last point.  If we know we're not even going to
> > look at the value, we can read past it without storing it in memory.  It
> is
> > surprising that skipping past a large value would cause the server to run
> > out of memory, especially since it should fit into memory enough times to
> > be returned to the caller.
> >
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > administrators:
> > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
> >
>