You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Geoffrey Jacoby <gj...@apache.org> on 2020/12/02 21:11:15 UTC

Coprocs and Off-Heap Writes

I'm code-reviewing a Phoenix PR [1] right now, which adds Tags to a
mutation's Cells in a coproc. A question has come up regarding coprocs and
the optional off-heaping of the write path in HBase 2.x and up.

For what parts of the write path (and hence, which coproc hooks) is it safe
to change the underlying Cells of a batch mutation without leaking off-heap
memory?

The HBase book entry on off-heap writes [2] just discusses the ability to
make the MemStore off-heap, but HBASE-15179 and its design doc[3] say that
the entire write stack is off-heap.

Why this matters is if in a RegionObserver coproc hook (that's before the
MemStore commit) the mutation Cells can be assumed to be on-heap, then
clearing the internal family map of the mutation and replacing them with
new, altered Cells is safe. (Extra GC pressure aside, of course.) If not, I
presume the coproc would be leaking off-heap memory (unless there's magic
cleanup somewhere?)

If this is not a safe assumption, what would the recommended way be to
alter a Cell's Tags in a coproc, since Tags are explicitly not exposed to
the HBase client, Cells are immutable, and hence the only way to do so
would be to create new Cells in a coproc? My question's not how to create
the new Cells (that's been answered elsewhere) but how to dispose of the
old, original ones.

Also, if this is not a safe assumption, is there an accepted LP(Coproc) or
Public API that a coproc can check to see if it's in an "off-heap" mode or
not so that a leak can be avoided?

Thanks,

Geoffrey Jacoby

References:
[1] https://github.com/apache/phoenix/pull/978
[2] https://hbase.apache.org/book.html#regionserver.offheap.writepath
[3]
https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit

Re: Coprocs and Off-Heap Writes

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Hi
Anoop has clearly answered I believe. the short answer is in your CP it is
better you copy/clone the cells so that there is no reference. I believe
the Index related WAL codec in Phoenix was also trying to do something
similar if I remember correctly. (I may be wrong though).

Regards
Ram

On Thu, Dec 3, 2020 at 9:30 AM Anoop John <an...@gmail.com> wrote:

> Hi Geoffrey,
>
> In case of off heap backed write path (RPC layer itself), the write payload
> is accepted into DBBs that we get from a pool.  And cells will be created
> over this DBB. In case we add Tags in CPs, there will be a new Cell POJO
> created.  But that will anyways refer to old POJO for all parts except
> Tags. See TagRewriteCell for eg:  Anyways, when we add cells to Memsore,
> then only we retrieve it from this RPC side buffer.   In the write path,
> once the call completes and comes back to RPC layer there we will release
> the buffer. So there should not be a worry of a leak.
> The only thing to be careful in CPs, is if you keep reference to Cells.  In
> such cases, it's advised to clone the cell (or parts of it) and keep that
> reference.  When RPC side we used pooled DBB, not doing this correctly can
> cause the Cell being corrupted later. (The buffer would be released once
> RPC call is over and later would be used to read some other write payload)
> Even in case of on heap buffer usage at RPC, keeping such ref without clone
> can cause issues as it will not allow the RPC payload read buffer (much
> larger size than a cell size typically) to get GCed. Anyways I know Phoenix
> Jira's aim is to create Cells with addition of tags , am saying it just as
> a pointer.
>
> Anoop
>
> On Thu, Dec 3, 2020 at 2:41 AM Geoffrey Jacoby <gj...@apache.org> wrote:
>
> > I'm code-reviewing a Phoenix PR [1] right now, which adds Tags to a
> > mutation's Cells in a coproc. A question has come up regarding coprocs
> and
> > the optional off-heaping of the write path in HBase 2.x and up.
> >
> > For what parts of the write path (and hence, which coproc hooks) is it
> safe
> > to change the underlying Cells of a batch mutation without leaking
> off-heap
> > memory?
> >
> > The HBase book entry on off-heap writes [2] just discusses the ability to
> > make the MemStore off-heap, but HBASE-15179 and its design doc[3] say
> that
> > the entire write stack is off-heap.
> >
> > Why this matters is if in a RegionObserver coproc hook (that's before the
> > MemStore commit) the mutation Cells can be assumed to be on-heap, then
> > clearing the internal family map of the mutation and replacing them with
> > new, altered Cells is safe. (Extra GC pressure aside, of course.) If
> not, I
> > presume the coproc would be leaking off-heap memory (unless there's magic
> > cleanup somewhere?)
> >
> > If this is not a safe assumption, what would the recommended way be to
> > alter a Cell's Tags in a coproc, since Tags are explicitly not exposed to
> > the HBase client, Cells are immutable, and hence the only way to do so
> > would be to create new Cells in a coproc? My question's not how to create
> > the new Cells (that's been answered elsewhere) but how to dispose of the
> > old, original ones.
> >
> > Also, if this is not a safe assumption, is there an accepted LP(Coproc)
> or
> > Public API that a coproc can check to see if it's in an "off-heap" mode
> or
> > not so that a leak can be avoided?
> >
> > Thanks,
> >
> > Geoffrey Jacoby
> >
> > References:
> > [1] https://github.com/apache/phoenix/pull/978
> > [2] https://hbase.apache.org/book.html#regionserver.offheap.writepath
> > [3]
> >
> >
> https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit
> >
>

Re: Coprocs and Off-Heap Writes

Posted by Anoop John <an...@gmail.com>.
Hi Geoffrey,

In case of off heap backed write path (RPC layer itself), the write payload
is accepted into DBBs that we get from a pool.  And cells will be created
over this DBB. In case we add Tags in CPs, there will be a new Cell POJO
created.  But that will anyways refer to old POJO for all parts except
Tags. See TagRewriteCell for eg:  Anyways, when we add cells to Memsore,
then only we retrieve it from this RPC side buffer.   In the write path,
once the call completes and comes back to RPC layer there we will release
the buffer. So there should not be a worry of a leak.
The only thing to be careful in CPs, is if you keep reference to Cells.  In
such cases, it's advised to clone the cell (or parts of it) and keep that
reference.  When RPC side we used pooled DBB, not doing this correctly can
cause the Cell being corrupted later. (The buffer would be released once
RPC call is over and later would be used to read some other write payload)
Even in case of on heap buffer usage at RPC, keeping such ref without clone
can cause issues as it will not allow the RPC payload read buffer (much
larger size than a cell size typically) to get GCed. Anyways I know Phoenix
Jira's aim is to create Cells with addition of tags , am saying it just as
a pointer.

Anoop

On Thu, Dec 3, 2020 at 2:41 AM Geoffrey Jacoby <gj...@apache.org> wrote:

> I'm code-reviewing a Phoenix PR [1] right now, which adds Tags to a
> mutation's Cells in a coproc. A question has come up regarding coprocs and
> the optional off-heaping of the write path in HBase 2.x and up.
>
> For what parts of the write path (and hence, which coproc hooks) is it safe
> to change the underlying Cells of a batch mutation without leaking off-heap
> memory?
>
> The HBase book entry on off-heap writes [2] just discusses the ability to
> make the MemStore off-heap, but HBASE-15179 and its design doc[3] say that
> the entire write stack is off-heap.
>
> Why this matters is if in a RegionObserver coproc hook (that's before the
> MemStore commit) the mutation Cells can be assumed to be on-heap, then
> clearing the internal family map of the mutation and replacing them with
> new, altered Cells is safe. (Extra GC pressure aside, of course.) If not, I
> presume the coproc would be leaking off-heap memory (unless there's magic
> cleanup somewhere?)
>
> If this is not a safe assumption, what would the recommended way be to
> alter a Cell's Tags in a coproc, since Tags are explicitly not exposed to
> the HBase client, Cells are immutable, and hence the only way to do so
> would be to create new Cells in a coproc? My question's not how to create
> the new Cells (that's been answered elsewhere) but how to dispose of the
> old, original ones.
>
> Also, if this is not a safe assumption, is there an accepted LP(Coproc) or
> Public API that a coproc can check to see if it's in an "off-heap" mode or
> not so that a leak can be avoided?
>
> Thanks,
>
> Geoffrey Jacoby
>
> References:
> [1] https://github.com/apache/phoenix/pull/978
> [2] https://hbase.apache.org/book.html#regionserver.offheap.writepath
> [3]
>
> https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit
>