You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Vrushali Channapattan <vr...@gmail.com> on 2015/12/22 08:07:33 UTC

hbase (coprocessors & cell tags) used in hadoop-yarn

A group of us in the hadoop community are working on Yarn's next gen
timeline service component https://issues.apache.org/jira/browse/YARN-2928

that will be storing for application that runs on a hadoop cluster all of
the application stats, workflow metadata and container metrics information
in hbase tables (some plain hbase tables and some phoenix based ones).

We have been thinking about validating some of the implementation
approaches we are taking with HBase. It would be great to get some feedback
on the code and design from the HBase dev perspective.

Among other things, we are making use of cell tags in coprocessors for
summation, min and max operations on different versions of cells in a given
column during read as well flush and compaction operations.  Some relevant
subjiras that deal with hbase coprocessors
https://issues.apache.org/jira/browse/YARN-4062
https://issues.apache.org/jira/browse/YARN-3901

We have the schema documented with example records in the code as well as
in pdf on the jira.

https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34

https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40

Schema jira (pdf attachment that describes the schema)
https://issues.apache.org/jira/browse/YARN-3411

Would appreciate any feedback/comments that you have and be glad to answer
any questions to clarify in depth further.

thanks
Vrushali

Re: hbase (coprocessors & cell tags) used in hadoop-yarn

Posted by ramkrishna vasudevan <ra...@gmail.com>.
I saw the patches some time back but got lost in other work. You are
creating the new cells with Tags inside the Coprocessors?
Do you see any need for introducing Tags to be added directly from the
client side as part of Puts (for your usecase)? Currently HBase does not
support Tags on the client side. Tags are now server side pieces for a Cell.

Regards
Ram

On Tue, Jan 5, 2016 at 1:34 AM, Vrushali Channapattan <vr...@gmail.com>
wrote:

> I see, thanks Anoop. We wanted to use cell tags for indicating the context
> of information in cells in that cells for aggregation purpose. It is
> referred to only in the coprocessor. We also use in the flush/compaction
> processing to decide which cells to discard/what info to keep.
>
> I will be on the lookout for Tag interface changes.
>
> On Thu, Dec 24, 2015 at 7:19 AM, Anoop John <an...@gmail.com> wrote:
>
> > I can see in the patches that you are trying to use Cell creation with
> Tags
> > and use of Tag APIs..  Only concern is Tag is Private audience marked. It
> > was created to support per cell ACL/ visibility etc.
> >
> > As part of off heaping effort, we are planning to make some changes to
> Tag
> > APIs.. (To make it interface impl itself).. This will happen in HBase
> > trunk..   So later when you move to newer version need to change it.
> >
> > -Anoop-
> >
> >
> > On Tue, Dec 22, 2015 at 12:37 PM, Vrushali Channapattan <
> > vrushali.c@gmail.com> wrote:
> >
> > > A group of us in the hadoop community are working on Yarn's next gen
> > > timeline service component
> > https://issues.apache.org/jira/browse/YARN-2928
> > >
> > > that will be storing for application that runs on a hadoop cluster all
> of
> > > the application stats, workflow metadata and container metrics
> > information
> > > in hbase tables (some plain hbase tables and some phoenix based ones).
> > >
> > > We have been thinking about validating some of the implementation
> > > approaches we are taking with HBase. It would be great to get some
> > feedback
> > > on the code and design from the HBase dev perspective.
> > >
> > > Among other things, we are making use of cell tags in coprocessors for
> > > summation, min and max operations on different versions of cells in a
> > given
> > > column during read as well flush and compaction operations.  Some
> > relevant
> > > subjiras that deal with hbase coprocessors
> > > https://issues.apache.org/jira/browse/YARN-4062
> > > https://issues.apache.org/jira/browse/YARN-3901
> > >
> > > We have the schema documented with example records in the code as well
> as
> > > in pdf on the jira.
> > >
> > >
> > >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34
> > >
> > >
> > >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40
> > >
> > > Schema jira (pdf attachment that describes the schema)
> > > https://issues.apache.org/jira/browse/YARN-3411
> > >
> > > Would appreciate any feedback/comments that you have and be glad to
> > answer
> > > any questions to clarify in depth further.
> > >
> > > thanks
> > > Vrushali
> > >
> >
>

Re: hbase (coprocessors & cell tags) used in hadoop-yarn

Posted by ramkrishna vasudevan <ra...@gmail.com>.
I saw the patches some time back but got lost in other work. You are
creating the new cells with Tags inside the Coprocessors?
Do you see any need for introducing Tags to be added directly from the
client side as part of Puts (for your usecase)? Currently HBase does not
support Tags on the client side. Tags are now server side pieces for a Cell.

Regards
Ram

On Tue, Jan 5, 2016 at 1:34 AM, Vrushali Channapattan <vr...@gmail.com>
wrote:

> I see, thanks Anoop. We wanted to use cell tags for indicating the context
> of information in cells in that cells for aggregation purpose. It is
> referred to only in the coprocessor. We also use in the flush/compaction
> processing to decide which cells to discard/what info to keep.
>
> I will be on the lookout for Tag interface changes.
>
> On Thu, Dec 24, 2015 at 7:19 AM, Anoop John <an...@gmail.com> wrote:
>
> > I can see in the patches that you are trying to use Cell creation with
> Tags
> > and use of Tag APIs..  Only concern is Tag is Private audience marked. It
> > was created to support per cell ACL/ visibility etc.
> >
> > As part of off heaping effort, we are planning to make some changes to
> Tag
> > APIs.. (To make it interface impl itself).. This will happen in HBase
> > trunk..   So later when you move to newer version need to change it.
> >
> > -Anoop-
> >
> >
> > On Tue, Dec 22, 2015 at 12:37 PM, Vrushali Channapattan <
> > vrushali.c@gmail.com> wrote:
> >
> > > A group of us in the hadoop community are working on Yarn's next gen
> > > timeline service component
> > https://issues.apache.org/jira/browse/YARN-2928
> > >
> > > that will be storing for application that runs on a hadoop cluster all
> of
> > > the application stats, workflow metadata and container metrics
> > information
> > > in hbase tables (some plain hbase tables and some phoenix based ones).
> > >
> > > We have been thinking about validating some of the implementation
> > > approaches we are taking with HBase. It would be great to get some
> > feedback
> > > on the code and design from the HBase dev perspective.
> > >
> > > Among other things, we are making use of cell tags in coprocessors for
> > > summation, min and max operations on different versions of cells in a
> > given
> > > column during read as well flush and compaction operations.  Some
> > relevant
> > > subjiras that deal with hbase coprocessors
> > > https://issues.apache.org/jira/browse/YARN-4062
> > > https://issues.apache.org/jira/browse/YARN-3901
> > >
> > > We have the schema documented with example records in the code as well
> as
> > > in pdf on the jira.
> > >
> > >
> > >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34
> > >
> > >
> > >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40
> > >
> > > Schema jira (pdf attachment that describes the schema)
> > > https://issues.apache.org/jira/browse/YARN-3411
> > >
> > > Would appreciate any feedback/comments that you have and be glad to
> > answer
> > > any questions to clarify in depth further.
> > >
> > > thanks
> > > Vrushali
> > >
> >
>

Re: hbase (coprocessors & cell tags) used in hadoop-yarn

Posted by Vrushali Channapattan <vr...@gmail.com>.
I see, thanks Anoop. We wanted to use cell tags for indicating the context
of information in cells in that cells for aggregation purpose. It is
referred to only in the coprocessor. We also use in the flush/compaction
processing to decide which cells to discard/what info to keep.

I will be on the lookout for Tag interface changes.

On Thu, Dec 24, 2015 at 7:19 AM, Anoop John <an...@gmail.com> wrote:

> I can see in the patches that you are trying to use Cell creation with Tags
> and use of Tag APIs..  Only concern is Tag is Private audience marked. It
> was created to support per cell ACL/ visibility etc.
>
> As part of off heaping effort, we are planning to make some changes to Tag
> APIs.. (To make it interface impl itself).. This will happen in HBase
> trunk..   So later when you move to newer version need to change it.
>
> -Anoop-
>
>
> On Tue, Dec 22, 2015 at 12:37 PM, Vrushali Channapattan <
> vrushali.c@gmail.com> wrote:
>
> > A group of us in the hadoop community are working on Yarn's next gen
> > timeline service component
> https://issues.apache.org/jira/browse/YARN-2928
> >
> > that will be storing for application that runs on a hadoop cluster all of
> > the application stats, workflow metadata and container metrics
> information
> > in hbase tables (some plain hbase tables and some phoenix based ones).
> >
> > We have been thinking about validating some of the implementation
> > approaches we are taking with HBase. It would be great to get some
> feedback
> > on the code and design from the HBase dev perspective.
> >
> > Among other things, we are making use of cell tags in coprocessors for
> > summation, min and max operations on different versions of cells in a
> given
> > column during read as well flush and compaction operations.  Some
> relevant
> > subjiras that deal with hbase coprocessors
> > https://issues.apache.org/jira/browse/YARN-4062
> > https://issues.apache.org/jira/browse/YARN-3901
> >
> > We have the schema documented with example records in the code as well as
> > in pdf on the jira.
> >
> >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34
> >
> >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40
> >
> > Schema jira (pdf attachment that describes the schema)
> > https://issues.apache.org/jira/browse/YARN-3411
> >
> > Would appreciate any feedback/comments that you have and be glad to
> answer
> > any questions to clarify in depth further.
> >
> > thanks
> > Vrushali
> >
>

Re: hbase (coprocessors & cell tags) used in hadoop-yarn

Posted by Vrushali Channapattan <vr...@gmail.com>.
I see, thanks Anoop. We wanted to use cell tags for indicating the context
of information in cells in that cells for aggregation purpose. It is
referred to only in the coprocessor. We also use in the flush/compaction
processing to decide which cells to discard/what info to keep.

I will be on the lookout for Tag interface changes.

On Thu, Dec 24, 2015 at 7:19 AM, Anoop John <an...@gmail.com> wrote:

> I can see in the patches that you are trying to use Cell creation with Tags
> and use of Tag APIs..  Only concern is Tag is Private audience marked. It
> was created to support per cell ACL/ visibility etc.
>
> As part of off heaping effort, we are planning to make some changes to Tag
> APIs.. (To make it interface impl itself).. This will happen in HBase
> trunk..   So later when you move to newer version need to change it.
>
> -Anoop-
>
>
> On Tue, Dec 22, 2015 at 12:37 PM, Vrushali Channapattan <
> vrushali.c@gmail.com> wrote:
>
> > A group of us in the hadoop community are working on Yarn's next gen
> > timeline service component
> https://issues.apache.org/jira/browse/YARN-2928
> >
> > that will be storing for application that runs on a hadoop cluster all of
> > the application stats, workflow metadata and container metrics
> information
> > in hbase tables (some plain hbase tables and some phoenix based ones).
> >
> > We have been thinking about validating some of the implementation
> > approaches we are taking with HBase. It would be great to get some
> feedback
> > on the code and design from the HBase dev perspective.
> >
> > Among other things, we are making use of cell tags in coprocessors for
> > summation, min and max operations on different versions of cells in a
> given
> > column during read as well flush and compaction operations.  Some
> relevant
> > subjiras that deal with hbase coprocessors
> > https://issues.apache.org/jira/browse/YARN-4062
> > https://issues.apache.org/jira/browse/YARN-3901
> >
> > We have the schema documented with example records in the code as well as
> > in pdf on the jira.
> >
> >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34
> >
> >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40
> >
> > Schema jira (pdf attachment that describes the schema)
> > https://issues.apache.org/jira/browse/YARN-3411
> >
> > Would appreciate any feedback/comments that you have and be glad to
> answer
> > any questions to clarify in depth further.
> >
> > thanks
> > Vrushali
> >
>

Re: hbase (coprocessors & cell tags) used in hadoop-yarn

Posted by Anoop John <an...@gmail.com>.
I can see in the patches that you are trying to use Cell creation with Tags
and use of Tag APIs..  Only concern is Tag is Private audience marked. It
was created to support per cell ACL/ visibility etc.

As part of off heaping effort, we are planning to make some changes to Tag
APIs.. (To make it interface impl itself).. This will happen in HBase
trunk..   So later when you move to newer version need to change it.

-Anoop-


On Tue, Dec 22, 2015 at 12:37 PM, Vrushali Channapattan <
vrushali.c@gmail.com> wrote:

> A group of us in the hadoop community are working on Yarn's next gen
> timeline service component https://issues.apache.org/jira/browse/YARN-2928
>
> that will be storing for application that runs on a hadoop cluster all of
> the application stats, workflow metadata and container metrics information
> in hbase tables (some plain hbase tables and some phoenix based ones).
>
> We have been thinking about validating some of the implementation
> approaches we are taking with HBase. It would be great to get some feedback
> on the code and design from the HBase dev perspective.
>
> Among other things, we are making use of cell tags in coprocessors for
> summation, min and max operations on different versions of cells in a given
> column during read as well flush and compaction operations.  Some relevant
> subjiras that deal with hbase coprocessors
> https://issues.apache.org/jira/browse/YARN-4062
> https://issues.apache.org/jira/browse/YARN-3901
>
> We have the schema documented with example records in the code as well as
> in pdf on the jira.
>
>
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34
>
>
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40
>
> Schema jira (pdf attachment that describes the schema)
> https://issues.apache.org/jira/browse/YARN-3411
>
> Would appreciate any feedback/comments that you have and be glad to answer
> any questions to clarify in depth further.
>
> thanks
> Vrushali
>

Re: hbase (coprocessors & cell tags) used in hadoop-yarn

Posted by Anoop John <an...@gmail.com>.
I can see in the patches that you are trying to use Cell creation with Tags
and use of Tag APIs..  Only concern is Tag is Private audience marked. It
was created to support per cell ACL/ visibility etc.

As part of off heaping effort, we are planning to make some changes to Tag
APIs.. (To make it interface impl itself).. This will happen in HBase
trunk..   So later when you move to newer version need to change it.

-Anoop-


On Tue, Dec 22, 2015 at 12:37 PM, Vrushali Channapattan <
vrushali.c@gmail.com> wrote:

> A group of us in the hadoop community are working on Yarn's next gen
> timeline service component https://issues.apache.org/jira/browse/YARN-2928
>
> that will be storing for application that runs on a hadoop cluster all of
> the application stats, workflow metadata and container metrics information
> in hbase tables (some plain hbase tables and some phoenix based ones).
>
> We have been thinking about validating some of the implementation
> approaches we are taking with HBase. It would be great to get some feedback
> on the code and design from the HBase dev perspective.
>
> Among other things, we are making use of cell tags in coprocessors for
> summation, min and max operations on different versions of cells in a given
> column during read as well flush and compaction operations.  Some relevant
> subjiras that deal with hbase coprocessors
> https://issues.apache.org/jira/browse/YARN-4062
> https://issues.apache.org/jira/browse/YARN-3901
>
> We have the schema documented with example records in the code as well as
> in pdf on the jira.
>
>
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34
>
>
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40
>
> Schema jira (pdf attachment that describes the schema)
> https://issues.apache.org/jira/browse/YARN-3411
>
> Would appreciate any feedback/comments that you have and be glad to answer
> any questions to clarify in depth further.
>
> thanks
> Vrushali
>