You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Rohit Jain <ro...@esgyn.com> on 2015/07/27 10:45:00 UTC

Re: [jira] [Created] (TRAFODION-1419) Add support for multiple column families in a trafodion table

Anoop,

Just to clarify a few things:
- The same set of HBase options will apply to both column families.  That is, you cannot vary the HBase options between column families.  By extension this goes to aligned format syntax as well, although in the future we may remove that limitation.
- Salting and Divisioning will apply to all column families
- An index on the table can only reference columns in the default column family and not in any of the other column families you might create
- DML does not reference a column family.  So views and other objects on the table will be oblivious of the CF a column resides in, implying implicit joins between CFs to retrieve results (handled at the HBase level?).
- Predicate push down against multiple column families will treat this like a join amongst tables and access the CF resulting in the lowest cardinality being accessed first and then joined with the rest of the CFs in cardinality based join order.  Or is this all handled at the HBase level - if so, I wonder how that works, especially with no cardinality information (I am assuming that CFs are scanned only if and when they are needed).

Rohit





On 7/24/15, 1:02 PM, "Anoop Sharma (JIRA)" <ji...@apache.org> wrote:

>Anoop Sharma created TRAFODION-1419:
>---------------------------------------
>
>             Summary: Add support for multiple column families in a trafodion table
>                 Key: TRAFODION-1419
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-1419
>             Project: Apache Trafodion
>          Issue Type: New Feature
>            Reporter: Anoop Sharma
>
>
>This proposal is to add support for multiple column families in trafodion tables. With this feature, one can store columns into multiple column families. One use for this would be to store frequently used columns in one column family and infrequently used columns to be stored in a different column family. That will have performance improvement when those columns are retrieved from hbase. There could be other uses as well.
>
>Syntax:
>create table <tablename> ( <colFam1>.<colName1>  <datatype>, <colFam2>.<colName2> <datatype> ….)
>  attributes default column family <colFam>;
>alter table <tablename> add column <colFam>.<colName> datatype;
><colFam>  :  name of column family for that column
>
>Semantics:
>	<colFam> name follows identifier rules. If  not double quoted, then it will be upper cased. If double quoted, then case will be maintained.
>
>	User specified column family can be of arbitrary length. To optimize space for column family stored in a cell, a 2 byte encoding is generated. 
>Mapping of user specified column family to encoded column family is stored in metadata.
>
>	If no column family is specified for a column during create table, then the family specified in ‘attributes default column family’ clause is used. 
>If no ‘attribute default column family’ clause is specified , then system default col family is used.
>
>	column family specification is supported for regular and volatile tables. 
>	all unique column families specified during create or alter are added to the table 
>	maximum number of column families supported in one table is 32. But it is hbase recommendation to not create too many column families. 
>	alter statement can be used to assign specific hbase options to specific column families
>using the NAME clause. If no name clause is specified, then alter hbase  options are applied
>to all col families.
>
>	invoke and showddl statements will show the original user specified column families and not the encoded column families
>
>	Currently, multiple column families are not supported for columns of a user created or an implicitly created index. 
>The default column family of the corresponding base table is used for all index columns.
>
>	column family cannot be specified in a DML query
>	column family cannot be specified for columns of an aligned row format table since all columns are stored as one cell
>	Column names must be unique for each table. The same column name cannot be used as part of multiple column families.
>
>
>
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)


Re: [jira] [Created] (TRAFODION-1419) Add support for multiple column families in a trafodion table

Posted by Dave Birdsall <da...@esgyn.com>.
Hi Eric,

You can get notifications on the JIRA by adding yourself to it. Log into
JIRA, open the JIRA of interest, and click on the "Start watching this
issue" link on the right.

Dave

On Mon, Jul 27, 2015 at 9:17 AM, Eric Owhadi <er...@esgyn.com> wrote:

> I receive Jira notification automatically when comments are added?
>
> -----Original Message-----
> From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
> Sent: Monday, July 27, 2015 10:18 AM
> To: dev@trafodion.incubator.apache.org
> Subject: Re: [jira] [Created] (TRAFODION-1419) Add support for multiple
> column families in a trafodion table
>
> Hi,
>
> Thanks, Eric, for the suggestion. Yes, please put all discussion of the
> JIRA
> in the JIRA itself. Though it still may be useful to send an e-mail to this
> list calling attention to the JIRA.
>
> Dave
>
> On Mon, Jul 27, 2015 at 7:00 AM, Qifan Chen <qi...@esgyn.com> wrote:
>
> > I added my comments to the the JIRA TRAFODION-1419
> > <https://issues.apache.org/jira/browse/TRAFODION-1419>
> >
> >
> > On Mon, Jul 27, 2015 at 6:13 AM, Eric Owhadi <er...@esgyn.com>
> > wrote:
> >
> > > Should we adopt this following discipline: As soon as a JIRA is open
> > > on a subject, all thread/discussion should happen using JIRA
> > > "comment", and creator of the JIRA would start the comment with a
> > > copy/past of any discussion that happened on the dev list prior to the
> > > JIRA creation?
> > > I find is very useful on HBase project to be able to go back to
> > > JIRAs and be able to see all discussions on the topic in one
> > > location. I would not
> > even
> > > try browsing the archive of the dev list to find these...
> > > If so, Rohit, would you mind using the "comment" feature of JIRA so
> > > that your insightful comments are persisted on the JIRA? I could
> > > copy past it for you, but that would lose the author...
> > > Eric
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Rohit Jain [mailto:rohit.jain@esgyn.com]
> > > Sent: Monday, July 27, 2015 3:45 AM
> > > To: dev@trafodion.incubator.apache.org;
> > > issues@trafodion.incubator.apache.org
> > > Subject: Re: [jira] [Created] (TRAFODION-1419) Add support for
> > > multiple column families in a trafodion table
> > >
> > > Anoop,
> > >
> > > Just to clarify a few things:
> > > - The same set of HBase options will apply to both column families.
> > > That is, you cannot vary the HBase options between column families.
> > > By extension this goes to aligned format syntax as well, although in
> > > the future we may remove that limitation.
> > > - Salting and Divisioning will apply to all column families
> > > - An index on the table can only reference columns in the default
> > > column family and not in any of the other column families you might
> > > create
> > > - DML does not reference a column family.  So views and other
> > > objects on the table will be oblivious of the CF a column resides
> > > in, implying implicit joins between CFs to retrieve results (handled
> > > at the HBase level?).
> > > - Predicate push down against multiple column families will treat
> > > this
> > like
> > > a join amongst tables and access the CF resulting in the lowest
> > cardinality
> > > being accessed first and then joined with the rest of the CFs in
> > > cardinality based join order.  Or is this all handled at the HBase
> > > level - if so, I wonder how that works, especially with no
> > > cardinality information (I am assuming that CFs are scanned only if
> > > and when they are needed).
> > >
> > > Rohit
> > >
> > >
> > >
> > >
> > >
> > > On 7/24/15, 1:02 PM, "Anoop Sharma (JIRA)" <ji...@apache.org> wrote:
> > >
> > > >Anoop Sharma created TRAFODION-1419:
> > > >---------------------------------------
> > > >
> > > >             Summary: Add support for multiple column families in a
> > > > trafodion table
> > > >                 Key: TRAFODION-1419
> > > >                 URL:
> > > https://issues.apache.org/jira/browse/TRAFODION-1419
> > > >             Project: Apache Trafodion
> > > >          Issue Type: New Feature
> > > >            Reporter: Anoop Sharma
> > > >
> > > >
> > > >This proposal is to add support for multiple column families in
> > trafodion
> > > >tables. With this feature, one can store columns into multiple
> > > >column families. One use for this would be to store frequently used
> > > >columns in
> > > one
> > > >column family and infrequently used columns to be stored in a
> > > >different column family. That will have performance improvement
> > > >when those columns are retrieved from hbase. There could be other uses
> > > >as well.
> > > >
> > > >Syntax:
> > > >create table <tablename> ( <colFam1>.<colName1>  <datatype>,
> > > ><colFam2>.<colName2> <datatype> ….)
> > > >  attributes default column family <colFam>; alter table
> > > ><tablename> add column <colFam>.<colName> datatype; <colFam>  :
> > > >name of column family for that column
> > > >
> > > >Semantics:
> > > >n      <colFam> name follows identifier rules. If  not double quoted,
> > > then it
> > > >will be upper cased. If double quoted, then case will be maintained.
> > > >
> > > >n      User specified column family can be of arbitrary length. To
> > > optimize
> > > >space for column family stored in a cell, a 2 byte encoding is
> > generated.
> > > >Mapping of user specified column family to encoded column family is
> > stored
> > > >in metadata.
> > > >
> > > >n      If no column family is specified for a column during create
> > table,
> > > then
> > > >the family specified in ‘attributes default column family’ clause
> > > >is
> > used.
> > > >If no ‘attribute default column family’ clause is specified , then
> > system
> > > >default col family is used.
> > > >
> > > >n      column family specification is supported for regular and
> > > >volatile
> > > tables.
> > > >n      all unique column families specified during create or alter are
> > > added to
> > > >the table
> > > >n      maximum number of column families supported in one table is 32.
> > > But it is
> > > >hbase recommendation to not create too many column families.
> > > >n      alter statement can be used to assign specific hbase options to
> > > specific
> > > >column families
> > > >using the NAME clause. If no name clause is specified, then alter
> > > >hbase options are applied to all col families.
> > > >
> > > >n      invoke and showddl statements will show the original user
> > specified
> > > >column families and not the encoded column families
> > > >
> > > >n      Currently, multiple column families are not supported for
> > > >columns
> > > of a
> > > >user created or an implicitly created index.
> > > >The default column family of the corresponding base table is used
> > > >for
> > all
> > > >index columns.
> > > >
> > > >n      column family cannot be specified in a DML query
> > > >n      column family cannot be specified for columns of an aligned row
> > > format
> > > >table since all columns are stored as one cell
> > > >n      Column names must be unique for each table. The same column
> name
> > > cannot
> > > >be used as part of multiple column families.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >--
> > > >This message was sent by Atlassian JIRA
> > > >(v6.3.4#6332)
> > >
> >
> >
> >
> > --
> > Regards, --Qifan
> >
>

RE: [jira] [Created] (TRAFODION-1419) Add support for multiple column families in a trafodion table

Posted by Eric Owhadi <er...@esgyn.com>.
I receive Jira notification automatically when comments are added?

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, July 27, 2015 10:18 AM
To: dev@trafodion.incubator.apache.org
Subject: Re: [jira] [Created] (TRAFODION-1419) Add support for multiple
column families in a trafodion table

Hi,

Thanks, Eric, for the suggestion. Yes, please put all discussion of the JIRA
in the JIRA itself. Though it still may be useful to send an e-mail to this
list calling attention to the JIRA.

Dave

On Mon, Jul 27, 2015 at 7:00 AM, Qifan Chen <qi...@esgyn.com> wrote:

> I added my comments to the the JIRA TRAFODION-1419
> <https://issues.apache.org/jira/browse/TRAFODION-1419>
>
>
> On Mon, Jul 27, 2015 at 6:13 AM, Eric Owhadi <er...@esgyn.com>
> wrote:
>
> > Should we adopt this following discipline: As soon as a JIRA is open
> > on a subject, all thread/discussion should happen using JIRA
> > "comment", and creator of the JIRA would start the comment with a
> > copy/past of any discussion that happened on the dev list prior to the
> > JIRA creation?
> > I find is very useful on HBase project to be able to go back to
> > JIRAs and be able to see all discussions on the topic in one
> > location. I would not
> even
> > try browsing the archive of the dev list to find these...
> > If so, Rohit, would you mind using the "comment" feature of JIRA so
> > that your insightful comments are persisted on the JIRA? I could
> > copy past it for you, but that would lose the author...
> > Eric
> >
> >
> >
> > -----Original Message-----
> > From: Rohit Jain [mailto:rohit.jain@esgyn.com]
> > Sent: Monday, July 27, 2015 3:45 AM
> > To: dev@trafodion.incubator.apache.org;
> > issues@trafodion.incubator.apache.org
> > Subject: Re: [jira] [Created] (TRAFODION-1419) Add support for
> > multiple column families in a trafodion table
> >
> > Anoop,
> >
> > Just to clarify a few things:
> > - The same set of HBase options will apply to both column families.
> > That is, you cannot vary the HBase options between column families.
> > By extension this goes to aligned format syntax as well, although in
> > the future we may remove that limitation.
> > - Salting and Divisioning will apply to all column families
> > - An index on the table can only reference columns in the default
> > column family and not in any of the other column families you might
> > create
> > - DML does not reference a column family.  So views and other
> > objects on the table will be oblivious of the CF a column resides
> > in, implying implicit joins between CFs to retrieve results (handled
> > at the HBase level?).
> > - Predicate push down against multiple column families will treat
> > this
> like
> > a join amongst tables and access the CF resulting in the lowest
> cardinality
> > being accessed first and then joined with the rest of the CFs in
> > cardinality based join order.  Or is this all handled at the HBase
> > level - if so, I wonder how that works, especially with no
> > cardinality information (I am assuming that CFs are scanned only if
> > and when they are needed).
> >
> > Rohit
> >
> >
> >
> >
> >
> > On 7/24/15, 1:02 PM, "Anoop Sharma (JIRA)" <ji...@apache.org> wrote:
> >
> > >Anoop Sharma created TRAFODION-1419:
> > >---------------------------------------
> > >
> > >             Summary: Add support for multiple column families in a
> > > trafodion table
> > >                 Key: TRAFODION-1419
> > >                 URL:
> > https://issues.apache.org/jira/browse/TRAFODION-1419
> > >             Project: Apache Trafodion
> > >          Issue Type: New Feature
> > >            Reporter: Anoop Sharma
> > >
> > >
> > >This proposal is to add support for multiple column families in
> trafodion
> > >tables. With this feature, one can store columns into multiple
> > >column families. One use for this would be to store frequently used
> > >columns in
> > one
> > >column family and infrequently used columns to be stored in a
> > >different column family. That will have performance improvement
> > >when those columns are retrieved from hbase. There could be other uses
> > >as well.
> > >
> > >Syntax:
> > >create table <tablename> ( <colFam1>.<colName1>  <datatype>,
> > ><colFam2>.<colName2> <datatype> ….)
> > >  attributes default column family <colFam>; alter table
> > ><tablename> add column <colFam>.<colName> datatype; <colFam>  :
> > >name of column family for that column
> > >
> > >Semantics:
> > >n      <colFam> name follows identifier rules. If  not double quoted,
> > then it
> > >will be upper cased. If double quoted, then case will be maintained.
> > >
> > >n      User specified column family can be of arbitrary length. To
> > optimize
> > >space for column family stored in a cell, a 2 byte encoding is
> generated.
> > >Mapping of user specified column family to encoded column family is
> stored
> > >in metadata.
> > >
> > >n      If no column family is specified for a column during create
> table,
> > then
> > >the family specified in ‘attributes default column family’ clause
> > >is
> used.
> > >If no ‘attribute default column family’ clause is specified , then
> system
> > >default col family is used.
> > >
> > >n      column family specification is supported for regular and
> > >volatile
> > tables.
> > >n      all unique column families specified during create or alter are
> > added to
> > >the table
> > >n      maximum number of column families supported in one table is 32.
> > But it is
> > >hbase recommendation to not create too many column families.
> > >n      alter statement can be used to assign specific hbase options to
> > specific
> > >column families
> > >using the NAME clause. If no name clause is specified, then alter
> > >hbase options are applied to all col families.
> > >
> > >n      invoke and showddl statements will show the original user
> specified
> > >column families and not the encoded column families
> > >
> > >n      Currently, multiple column families are not supported for
> > >columns
> > of a
> > >user created or an implicitly created index.
> > >The default column family of the corresponding base table is used
> > >for
> all
> > >index columns.
> > >
> > >n      column family cannot be specified in a DML query
> > >n      column family cannot be specified for columns of an aligned row
> > format
> > >table since all columns are stored as one cell
> > >n      Column names must be unique for each table. The same column name
> > cannot
> > >be used as part of multiple column families.
> > >
> > >
> > >
> > >
> > >
> > >
> > >--
> > >This message was sent by Atlassian JIRA
> > >(v6.3.4#6332)
> >
>
>
>
> --
> Regards, --Qifan
>

Re: [jira] [Created] (TRAFODION-1419) Add support for multiple column families in a trafodion table

Posted by Dave Birdsall <da...@esgyn.com>.
Hi,

Thanks, Eric, for the suggestion. Yes, please put all discussion of the
JIRA in the JIRA itself. Though it still may be useful to send an e-mail to
this list calling attention to the JIRA.

Dave

On Mon, Jul 27, 2015 at 7:00 AM, Qifan Chen <qi...@esgyn.com> wrote:

> I added my comments to the the JIRA TRAFODION-1419
> <https://issues.apache.org/jira/browse/TRAFODION-1419>
>
>
> On Mon, Jul 27, 2015 at 6:13 AM, Eric Owhadi <er...@esgyn.com>
> wrote:
>
> > Should we adopt this following discipline: As soon as a JIRA is open on a
> > subject, all thread/discussion should happen using JIRA "comment", and
> > creator of the JIRA would start the comment with a copy/past of any
> > discussion that happened on the dev list prior to the JIRA creation?
> > I find is very useful on HBase project to be able to go back to JIRAs and
> > be
> > able to see all discussions on the topic in one location. I would not
> even
> > try browsing the archive of the dev list to find these...
> > If so, Rohit, would you mind using the "comment" feature of JIRA so that
> > your insightful comments are persisted on the JIRA? I could copy past it
> > for
> > you, but that would lose the author...
> > Eric
> >
> >
> >
> > -----Original Message-----
> > From: Rohit Jain [mailto:rohit.jain@esgyn.com]
> > Sent: Monday, July 27, 2015 3:45 AM
> > To: dev@trafodion.incubator.apache.org;
> > issues@trafodion.incubator.apache.org
> > Subject: Re: [jira] [Created] (TRAFODION-1419) Add support for multiple
> > column families in a trafodion table
> >
> > Anoop,
> >
> > Just to clarify a few things:
> > - The same set of HBase options will apply to both column families.  That
> > is, you cannot vary the HBase options between column families.  By
> > extension
> > this goes to aligned format syntax as well, although in the future we may
> > remove that limitation.
> > - Salting and Divisioning will apply to all column families
> > - An index on the table can only reference columns in the default column
> > family and not in any of the other column families you might create
> > - DML does not reference a column family.  So views and other objects on
> > the
> > table will be oblivious of the CF a column resides in, implying implicit
> > joins between CFs to retrieve results (handled at the HBase level?).
> > - Predicate push down against multiple column families will treat this
> like
> > a join amongst tables and access the CF resulting in the lowest
> cardinality
> > being accessed first and then joined with the rest of the CFs in
> > cardinality
> > based join order.  Or is this all handled at the HBase level - if so, I
> > wonder how that works, especially with no cardinality information (I am
> > assuming that CFs are scanned only if and when they are needed).
> >
> > Rohit
> >
> >
> >
> >
> >
> > On 7/24/15, 1:02 PM, "Anoop Sharma (JIRA)" <ji...@apache.org> wrote:
> >
> > >Anoop Sharma created TRAFODION-1419:
> > >---------------------------------------
> > >
> > >             Summary: Add support for multiple column families in a
> > > trafodion table
> > >                 Key: TRAFODION-1419
> > >                 URL:
> > https://issues.apache.org/jira/browse/TRAFODION-1419
> > >             Project: Apache Trafodion
> > >          Issue Type: New Feature
> > >            Reporter: Anoop Sharma
> > >
> > >
> > >This proposal is to add support for multiple column families in
> trafodion
> > >tables. With this feature, one can store columns into multiple column
> > >families. One use for this would be to store frequently used columns in
> > one
> > >column family and infrequently used columns to be stored in a different
> > >column family. That will have performance improvement when those columns
> > >are retrieved from hbase. There could be other uses as well.
> > >
> > >Syntax:
> > >create table <tablename> ( <colFam1>.<colName1>  <datatype>,
> > ><colFam2>.<colName2> <datatype> ….)
> > >  attributes default column family <colFam>; alter table <tablename>
> > >add column <colFam>.<colName> datatype; <colFam>  :  name of column
> > >family for that column
> > >
> > >Semantics:
> > >n      <colFam> name follows identifier rules. If  not double quoted,
> > then it
> > >will be upper cased. If double quoted, then case will be maintained.
> > >
> > >n      User specified column family can be of arbitrary length. To
> > optimize
> > >space for column family stored in a cell, a 2 byte encoding is
> generated.
> > >Mapping of user specified column family to encoded column family is
> stored
> > >in metadata.
> > >
> > >n      If no column family is specified for a column during create
> table,
> > then
> > >the family specified in ‘attributes default column family’ clause is
> used.
> > >If no ‘attribute default column family’ clause is specified , then
> system
> > >default col family is used.
> > >
> > >n      column family specification is supported for regular and volatile
> > tables.
> > >n      all unique column families specified during create or alter are
> > added to
> > >the table
> > >n      maximum number of column families supported in one table is 32.
> > But it is
> > >hbase recommendation to not create too many column families.
> > >n      alter statement can be used to assign specific hbase options to
> > specific
> > >column families
> > >using the NAME clause. If no name clause is specified, then alter hbase
> > >options are applied to all col families.
> > >
> > >n      invoke and showddl statements will show the original user
> specified
> > >column families and not the encoded column families
> > >
> > >n      Currently, multiple column families are not supported for columns
> > of a
> > >user created or an implicitly created index.
> > >The default column family of the corresponding base table is used for
> all
> > >index columns.
> > >
> > >n      column family cannot be specified in a DML query
> > >n      column family cannot be specified for columns of an aligned row
> > format
> > >table since all columns are stored as one cell
> > >n      Column names must be unique for each table. The same column name
> > cannot
> > >be used as part of multiple column families.
> > >
> > >
> > >
> > >
> > >
> > >
> > >--
> > >This message was sent by Atlassian JIRA
> > >(v6.3.4#6332)
> >
>
>
>
> --
> Regards, --Qifan
>

Re: [jira] [Created] (TRAFODION-1419) Add support for multiple column families in a trafodion table

Posted by Qifan Chen <qi...@esgyn.com>.
I added my comments to the the JIRA TRAFODION-1419
<https://issues.apache.org/jira/browse/TRAFODION-1419>


On Mon, Jul 27, 2015 at 6:13 AM, Eric Owhadi <er...@esgyn.com> wrote:

> Should we adopt this following discipline: As soon as a JIRA is open on a
> subject, all thread/discussion should happen using JIRA "comment", and
> creator of the JIRA would start the comment with a copy/past of any
> discussion that happened on the dev list prior to the JIRA creation?
> I find is very useful on HBase project to be able to go back to JIRAs and
> be
> able to see all discussions on the topic in one location. I would not even
> try browsing the archive of the dev list to find these...
> If so, Rohit, would you mind using the "comment" feature of JIRA so that
> your insightful comments are persisted on the JIRA? I could copy past it
> for
> you, but that would lose the author...
> Eric
>
>
>
> -----Original Message-----
> From: Rohit Jain [mailto:rohit.jain@esgyn.com]
> Sent: Monday, July 27, 2015 3:45 AM
> To: dev@trafodion.incubator.apache.org;
> issues@trafodion.incubator.apache.org
> Subject: Re: [jira] [Created] (TRAFODION-1419) Add support for multiple
> column families in a trafodion table
>
> Anoop,
>
> Just to clarify a few things:
> - The same set of HBase options will apply to both column families.  That
> is, you cannot vary the HBase options between column families.  By
> extension
> this goes to aligned format syntax as well, although in the future we may
> remove that limitation.
> - Salting and Divisioning will apply to all column families
> - An index on the table can only reference columns in the default column
> family and not in any of the other column families you might create
> - DML does not reference a column family.  So views and other objects on
> the
> table will be oblivious of the CF a column resides in, implying implicit
> joins between CFs to retrieve results (handled at the HBase level?).
> - Predicate push down against multiple column families will treat this like
> a join amongst tables and access the CF resulting in the lowest cardinality
> being accessed first and then joined with the rest of the CFs in
> cardinality
> based join order.  Or is this all handled at the HBase level - if so, I
> wonder how that works, especially with no cardinality information (I am
> assuming that CFs are scanned only if and when they are needed).
>
> Rohit
>
>
>
>
>
> On 7/24/15, 1:02 PM, "Anoop Sharma (JIRA)" <ji...@apache.org> wrote:
>
> >Anoop Sharma created TRAFODION-1419:
> >---------------------------------------
> >
> >             Summary: Add support for multiple column families in a
> > trafodion table
> >                 Key: TRAFODION-1419
> >                 URL:
> https://issues.apache.org/jira/browse/TRAFODION-1419
> >             Project: Apache Trafodion
> >          Issue Type: New Feature
> >            Reporter: Anoop Sharma
> >
> >
> >This proposal is to add support for multiple column families in trafodion
> >tables. With this feature, one can store columns into multiple column
> >families. One use for this would be to store frequently used columns in
> one
> >column family and infrequently used columns to be stored in a different
> >column family. That will have performance improvement when those columns
> >are retrieved from hbase. There could be other uses as well.
> >
> >Syntax:
> >create table <tablename> ( <colFam1>.<colName1>  <datatype>,
> ><colFam2>.<colName2> <datatype> ….)
> >  attributes default column family <colFam>; alter table <tablename>
> >add column <colFam>.<colName> datatype; <colFam>  :  name of column
> >family for that column
> >
> >Semantics:
> >n      <colFam> name follows identifier rules. If  not double quoted,
> then it
> >will be upper cased. If double quoted, then case will be maintained.
> >
> >n      User specified column family can be of arbitrary length. To
> optimize
> >space for column family stored in a cell, a 2 byte encoding is generated.
> >Mapping of user specified column family to encoded column family is stored
> >in metadata.
> >
> >n      If no column family is specified for a column during create table,
> then
> >the family specified in ‘attributes default column family’ clause is used.
> >If no ‘attribute default column family’ clause is specified , then system
> >default col family is used.
> >
> >n      column family specification is supported for regular and volatile
> tables.
> >n      all unique column families specified during create or alter are
> added to
> >the table
> >n      maximum number of column families supported in one table is 32.
> But it is
> >hbase recommendation to not create too many column families.
> >n      alter statement can be used to assign specific hbase options to
> specific
> >column families
> >using the NAME clause. If no name clause is specified, then alter hbase
> >options are applied to all col families.
> >
> >n      invoke and showddl statements will show the original user specified
> >column families and not the encoded column families
> >
> >n      Currently, multiple column families are not supported for columns
> of a
> >user created or an implicitly created index.
> >The default column family of the corresponding base table is used for all
> >index columns.
> >
> >n      column family cannot be specified in a DML query
> >n      column family cannot be specified for columns of an aligned row
> format
> >table since all columns are stored as one cell
> >n      Column names must be unique for each table. The same column name
> cannot
> >be used as part of multiple column families.
> >
> >
> >
> >
> >
> >
> >--
> >This message was sent by Atlassian JIRA
> >(v6.3.4#6332)
>



-- 
Regards, --Qifan

RE: [jira] [Created] (TRAFODION-1419) Add support for multiple column families in a trafodion table

Posted by Eric Owhadi <er...@esgyn.com>.
Should we adopt this following discipline: As soon as a JIRA is open on a
subject, all thread/discussion should happen using JIRA "comment", and
creator of the JIRA would start the comment with a copy/past of any
discussion that happened on the dev list prior to the JIRA creation?
I find is very useful on HBase project to be able to go back to JIRAs and be
able to see all discussions on the topic in one location. I would not even
try browsing the archive of the dev list to find these...
If so, Rohit, would you mind using the "comment" feature of JIRA so that
your insightful comments are persisted on the JIRA? I could copy past it for
you, but that would lose the author...
Eric



-----Original Message-----
From: Rohit Jain [mailto:rohit.jain@esgyn.com]
Sent: Monday, July 27, 2015 3:45 AM
To: dev@trafodion.incubator.apache.org;
issues@trafodion.incubator.apache.org
Subject: Re: [jira] [Created] (TRAFODION-1419) Add support for multiple
column families in a trafodion table

Anoop,

Just to clarify a few things:
- The same set of HBase options will apply to both column families.  That
is, you cannot vary the HBase options between column families.  By extension
this goes to aligned format syntax as well, although in the future we may
remove that limitation.
- Salting and Divisioning will apply to all column families
- An index on the table can only reference columns in the default column
family and not in any of the other column families you might create
- DML does not reference a column family.  So views and other objects on the
table will be oblivious of the CF a column resides in, implying implicit
joins between CFs to retrieve results (handled at the HBase level?).
- Predicate push down against multiple column families will treat this like
a join amongst tables and access the CF resulting in the lowest cardinality
being accessed first and then joined with the rest of the CFs in cardinality
based join order.  Or is this all handled at the HBase level - if so, I
wonder how that works, especially with no cardinality information (I am
assuming that CFs are scanned only if and when they are needed).

Rohit





On 7/24/15, 1:02 PM, "Anoop Sharma (JIRA)" <ji...@apache.org> wrote:

>Anoop Sharma created TRAFODION-1419:
>---------------------------------------
>
>             Summary: Add support for multiple column families in a
> trafodion table
>                 Key: TRAFODION-1419
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-1419
>             Project: Apache Trafodion
>          Issue Type: New Feature
>            Reporter: Anoop Sharma
>
>
>This proposal is to add support for multiple column families in trafodion
>tables. With this feature, one can store columns into multiple column
>families. One use for this would be to store frequently used columns in one
>column family and infrequently used columns to be stored in a different
>column family. That will have performance improvement when those columns
>are retrieved from hbase. There could be other uses as well.
>
>Syntax:
>create table <tablename> ( <colFam1>.<colName1>  <datatype>,
><colFam2>.<colName2> <datatype> ….)
>  attributes default column family <colFam>; alter table <tablename>
>add column <colFam>.<colName> datatype; <colFam>  :  name of column
>family for that column
>
>Semantics:
>n	<colFam> name follows identifier rules. If  not double quoted, then it
>will be upper cased. If double quoted, then case will be maintained.
>
>n	User specified column family can be of arbitrary length. To optimize
>space for column family stored in a cell, a 2 byte encoding is generated.
>Mapping of user specified column family to encoded column family is stored
>in metadata.
>
>n	If no column family is specified for a column during create table, then
>the family specified in ‘attributes default column family’ clause is used.
>If no ‘attribute default column family’ clause is specified , then system
>default col family is used.
>
>n	column family specification is supported for regular and volatile tables.
>n	all unique column families specified during create or alter are added to
>the table
>n	maximum number of column families supported in one table is 32. But it is
>hbase recommendation to not create too many column families.
>n	alter statement can be used to assign specific hbase options to specific
>column families
>using the NAME clause. If no name clause is specified, then alter hbase
>options are applied to all col families.
>
>n	invoke and showddl statements will show the original user specified
>column families and not the encoded column families
>
>n	Currently, multiple column families are not supported for columns of a
>user created or an implicitly created index.
>The default column family of the corresponding base table is used for all
>index columns.
>
>n	column family cannot be specified in a DML query
>n	column family cannot be specified for columns of an aligned row format
>table since all columns are stored as one cell
>n	Column names must be unique for each table. The same column name cannot
>be used as part of multiple column families.
>
>
>
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)