You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Domen Kren <dk...@gmail.com> on 2018/08/31 14:43:19 UTC

TTL on a single column family in table

Hello,

we have situation where we would like to set TTL on a single column family in a table. After getting errors while trying to do that trough a phoenix command i found this issue, https://issues.apache.org/jira/browse/PHOENIX-1409, where it said "TTL - James Taylor and I discussed offline and we decided that for now we will only be supporting for all column families to have the same TTL as the empty column family. This means we error out if a column family is specified while setting TTL property - both at CREATE TABLE and ALTER TABLE time. Also changes were made to make sure that any new column family added gets the same TTL as the empty CF."

If i understand correctly, this was a design decision and not a technical one. So my question is, if i change this configuration trough HBase API or console, could there be potential problems that arise in phoenix?

Thanks you and best regards,
Domen Kren



Re: TTL on a single column family in table

Posted by Domen Kren <dk...@gmail.com>.
Hey,

let me describe our situation. We have a table with 3 column families, let us say a, b and c. They are segregated by our access patterns and data usage. Column family a uses about 10%, b around 25% and c around 65% of space in every row. We have a situation where CF a has no TTL, data will not be deleted, CF b is up to debate (can probably be deleted after 6 months) and c is not needed after 4 weeks(1 month). By deleting not needed CFs in every row we can free up to 80+% of our space in the table. The table has no additional indexes.

I have read about the empty/default CF as it was part of the debate in the issue i linked, but in our case that should not be a problem as we define CF a first and there is no TTL on that CF. We also did some preliminary test and have had no problems. TTL on CF b and CF c behaves the same as if we deleted these columns manually.

There are some other approaches we tried, like manually deleting all cells that are overdue, and using multiple tables with their own TTLs, but they bring some overhead(a lot, in case of multiple tables) and would like to use the HBase TTL tool that is perfect for our case.  

Of course i understand that this usage is in violation of Phoenix library design and would bring overhead in checking updates and new patterns added(checking for conflicts), so either way, there is additional work and we have not decided on the final solution. But from our understanding this usage of TTL should mirror manual delete and be a lot more efficient.

If you have any additional concerns, hints or even ideas for other approaches we would really appreciate them.

Again, thank you and best regards,
Domen Kren

PS: The table is projected to a be pretty large, 100+TB in few years(without not needed columns), so we are really focused on size and cleanup optimization.

On 2018/09/04 22:30:12, Chinmay Kulkarni <ch...@gmail.com> wrote: 
> Hi Domen,
> 
> After PHOENIX-1409, we don't allow specifying a TTL for a specific column
> family and all column families share the same TTL value. If you were to
> alter this using HBase APIs, this could lead to many inconsistencies at the
> Phoenix level where we assume all CFs to have the same TTL value. For
> example, if you were to alter the TTL value of the empty/default column
> family for a table, then a select count(*) query on the table would reflect
> a different value depending whether the TTL for that column family has
> expired or not. Whereas, if you were to alter the TTL for any other column
> family, this would not affect the result of the select count(*) since we
> use the dummy value written to the empty/default column family to
> efficiently calculate count(*). There may also be other code paths that
> could give inconsistent results after this change.
> 
> In fact, PHOENIX-3955 aims to propagate the TTL, REPLICATION_SCOPE and
> KEEP_DELETED_CELLS properties to all column families of a table as well as
> its indexes, in order to keep data in sync between the base table and its
> indexes. What is the reason you wish to manually alter the TTL of a single
> column family?
> 
> On Tue, Sep 4, 2018 at 3:29 PM Thomas D'Silva <td...@salesforce.com>
> wrote:
> 
> > If you  set different TTLs for column families you can run into issues
> > with SELECT count(*) queries not working correctly (depending on which
> > column family is used to store the EMPTY_COLUMN_VALUE).
> >
> > On Tue, Sep 4, 2018 at 10:56 AM, Sergey Soldatov <
> > sergey.soldatov@gmail.com> wrote:
> >
> >> What is the use case to set TTL only for a single column family? I would
> >> say that making TTL table wide is a mostly technical decision because in
> >> relational databases we operate with rows and supporting TTL for only some
> >> columns sounds a bit strange.
> >>
> >> Thanks,
> >> Sergey
> >>
> >> On Fri, Aug 31, 2018 at 7:43 AM Domen Kren <dk...@gmail.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> we have situation where we would like to set TTL on a single column
> >>> family in a table. After getting errors while trying to do that trough a
> >>> phoenix command i found this issue,
> >>> https://issues.apache.org/jira/browse/PHOENIX-1409, where it said "TTL
> >>> - James Taylor and I discussed offline and we decided that for now we will
> >>> only be supporting for all column families to have the same TTL as the
> >>> empty column family. This means we error out if a column family is
> >>> specified while setting TTL property - both at CREATE TABLE and ALTER TABLE
> >>> time. Also changes were made to make sure that any new column family added
> >>> gets the same TTL as the empty CF."
> >>>
> >>> If i understand correctly, this was a design decision and not a
> >>> technical one. So my question is, if i change this configuration trough
> >>> HBase API or console, could there be potential problems that arise in
> >>> phoenix?
> >>>
> >>> Thanks you and best regards,
> >>> Domen Kren
> >>>
> >>>
> >>>
> >
> 
> -- 
> Chinmay Kulkarni
> M.S. Computer Science,
> University of Illinois at Urbana-Champaign.
> B. Tech Computer Engineering,
> College of Engineering, Pune.
> 

Re: TTL on a single column family in table

Posted by Chinmay Kulkarni <ch...@gmail.com>.
Hi Domen,

After PHOENIX-1409, we don't allow specifying a TTL for a specific column
family and all column families share the same TTL value. If you were to
alter this using HBase APIs, this could lead to many inconsistencies at the
Phoenix level where we assume all CFs to have the same TTL value. For
example, if you were to alter the TTL value of the empty/default column
family for a table, then a select count(*) query on the table would reflect
a different value depending whether the TTL for that column family has
expired or not. Whereas, if you were to alter the TTL for any other column
family, this would not affect the result of the select count(*) since we
use the dummy value written to the empty/default column family to
efficiently calculate count(*). There may also be other code paths that
could give inconsistent results after this change.

In fact, PHOENIX-3955 aims to propagate the TTL, REPLICATION_SCOPE and
KEEP_DELETED_CELLS properties to all column families of a table as well as
its indexes, in order to keep data in sync between the base table and its
indexes. What is the reason you wish to manually alter the TTL of a single
column family?

On Tue, Sep 4, 2018 at 3:29 PM Thomas D'Silva <td...@salesforce.com>
wrote:

> If you  set different TTLs for column families you can run into issues
> with SELECT count(*) queries not working correctly (depending on which
> column family is used to store the EMPTY_COLUMN_VALUE).
>
> On Tue, Sep 4, 2018 at 10:56 AM, Sergey Soldatov <
> sergey.soldatov@gmail.com> wrote:
>
>> What is the use case to set TTL only for a single column family? I would
>> say that making TTL table wide is a mostly technical decision because in
>> relational databases we operate with rows and supporting TTL for only some
>> columns sounds a bit strange.
>>
>> Thanks,
>> Sergey
>>
>> On Fri, Aug 31, 2018 at 7:43 AM Domen Kren <dk...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> we have situation where we would like to set TTL on a single column
>>> family in a table. After getting errors while trying to do that trough a
>>> phoenix command i found this issue,
>>> https://issues.apache.org/jira/browse/PHOENIX-1409, where it said "TTL
>>> - James Taylor and I discussed offline and we decided that for now we will
>>> only be supporting for all column families to have the same TTL as the
>>> empty column family. This means we error out if a column family is
>>> specified while setting TTL property - both at CREATE TABLE and ALTER TABLE
>>> time. Also changes were made to make sure that any new column family added
>>> gets the same TTL as the empty CF."
>>>
>>> If i understand correctly, this was a design decision and not a
>>> technical one. So my question is, if i change this configuration trough
>>> HBase API or console, could there be potential problems that arise in
>>> phoenix?
>>>
>>> Thanks you and best regards,
>>> Domen Kren
>>>
>>>
>>>
>

-- 
Chinmay Kulkarni
M.S. Computer Science,
University of Illinois at Urbana-Champaign.
B. Tech Computer Engineering,
College of Engineering, Pune.

Re: TTL on a single column family in table

Posted by Thomas D'Silva <td...@salesforce.com>.
If you  set different TTLs for column families you can run into issues with
SELECT count(*) queries not working correctly (depending on which column
family is used to store the EMPTY_COLUMN_VALUE).

On Tue, Sep 4, 2018 at 10:56 AM, Sergey Soldatov <se...@gmail.com>
wrote:

> What is the use case to set TTL only for a single column family? I would
> say that making TTL table wide is a mostly technical decision because in
> relational databases we operate with rows and supporting TTL for only some
> columns sounds a bit strange.
>
> Thanks,
> Sergey
>
> On Fri, Aug 31, 2018 at 7:43 AM Domen Kren <dk...@gmail.com> wrote:
>
>> Hello,
>>
>> we have situation where we would like to set TTL on a single column
>> family in a table. After getting errors while trying to do that trough a
>> phoenix command i found this issue, https://issues.apache.org/
>> jira/browse/PHOENIX-1409, where it said "TTL - James Taylor and I
>> discussed offline and we decided that for now we will only be supporting
>> for all column families to have the same TTL as the empty column family.
>> This means we error out if a column family is specified while setting TTL
>> property - both at CREATE TABLE and ALTER TABLE time. Also changes were
>> made to make sure that any new column family added gets the same TTL as the
>> empty CF."
>>
>> If i understand correctly, this was a design decision and not a technical
>> one. So my question is, if i change this configuration trough HBase API or
>> console, could there be potential problems that arise in phoenix?
>>
>> Thanks you and best regards,
>> Domen Kren
>>
>>
>>

Re: TTL on a single column family in table

Posted by Sergey Soldatov <se...@gmail.com>.
What is the use case to set TTL only for a single column family? I would
say that making TTL table wide is a mostly technical decision because in
relational databases we operate with rows and supporting TTL for only some
columns sounds a bit strange.

Thanks,
Sergey

On Fri, Aug 31, 2018 at 7:43 AM Domen Kren <dk...@gmail.com> wrote:

> Hello,
>
> we have situation where we would like to set TTL on a single column family
> in a table. After getting errors while trying to do that trough a phoenix
> command i found this issue,
> https://issues.apache.org/jira/browse/PHOENIX-1409, where it said "TTL -
> James Taylor and I discussed offline and we decided that for now we will
> only be supporting for all column families to have the same TTL as the
> empty column family. This means we error out if a column family is
> specified while setting TTL property - both at CREATE TABLE and ALTER TABLE
> time. Also changes were made to make sure that any new column family added
> gets the same TTL as the empty CF."
>
> If i understand correctly, this was a design decision and not a technical
> one. So my question is, if i change this configuration trough HBase API or
> console, could there be potential problems that arise in phoenix?
>
> Thanks you and best regards,
> Domen Kren
>
>
>