You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Chinmay Kulkarni (JIRA)" <ji...@apache.org> on 2018/05/11 09:56:00 UTC

[jira] [Commented] (PHOENIX-3955) Ensure KEEP_DELETED_CELLS, REPLICATION_SCOPE, and TTL properties stay in sync between the physical data table and index tables

    [ https://issues.apache.org/jira/browse/PHOENIX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471710#comment-16471710 ] 

Chinmay Kulkarni commented on PHOENIX-3955:
-------------------------------------------

Hey [~jamestaylor], [~samarthjain] [~tdsilva]
Here are some points on achieving this along with some questions I have:
Let's take a simple example. Say I create the base data table with the following query:

{code:sql}
CREATE TABLE IF NOT EXISTS z_base_table (
id INTEGER not null primary key, CF1.host VARCHAR(10),flag BOOLEAN) 
TTL=120000,CF1.KEEP_DELETED_CELLS='true',REPLICATION_SCOPE='1';
{code}

We have the following paths to consider:

1. Create Index code path:
* *Case1: We create the data table with specific column families and there is no default CF*:
In this case, the global index table's default CF and the CFs corresponding to all local indexes should have default values for REPLICATION_SCOPE and KEEP_DELETED_CELLS as they do now, BUT they should inherit the TTL property from the non-local index CFs. In this case, it should be sufficient to check any non-local index CF's TTL since they are enforced to all be the same. 

* *Case2: The data table has a default CF*:
In this case, the global index table's default CF and the CFs corresponding to all local indexes should inherit REPLICATION_SCOPE, KEEP_DELETED_CELLS and the TTL property from the data table's default CF.

* *Question 1*: If we create an index with its own properties, say something like:
{code:sql}
CREATE INDEX diff_properties_z_index ON z_base_table(host) TTL=5000,KEEP_DELETED_CELLS='true';
{code}
We override the data table properties making the index tables and data table properties out of sync. This JIRA might set expectations that these properties are always in sync between index tables and the data table, so should we disallow this henceforth? At the very least we may want to log that the index table and data table properties will be out of sync after executing this query.

* *Question 1.1*: Given the above situation, if we later on alter the data table, should we blindly also alter the properties of the index tables (given that we want them to be in sync), or only alter index table properties in case they are equivalent to the data table properties?

* "Create index code path" changes should be achievable by changes in _CQSI.generateTableDescriptor_ before we apply specific properties of the index tables themselves.

2. Alter table set <TTL/REPLICATION_SCOPE/KEEP_DELETED_CELLS> code path:
* Here we can keep track of properties to be applied to _QueryConstants.ALL_FAMILY_PROPERTIES_KEY_ and not to specific CFs. In case we are changing TTL, REPLICATION_SCOPE or KEEP_DELETED_CELLS for all families, we will alter the properties for index table CFs as well.

* *Case 1: Global Index Tables:*
We can have _CQSI.separateAndValidateProperties_ return a _Map<table name/desc, Pair<orig table desc, new table desc>>_ and then later store all tabledescs and call _sendHBaseMetaData_() with this list of changes (which will now include GLOBAL index table changes as well). 

* *Case 2: Local Indexes:*
Can we simply change the column descriptor for the local index CF for the data table? I'm not sure if this makes sense, but feel free to throw some light on this case.

* *Question 2:*: If I create a local index on a CF specific column like:
{code:sql}
CREATE LOCAL INDEX cf_specific_z_index ON z_base_table(host);
{code}
 then shouldn't the local index be using a CF of "L#CF1" instead of the default "L#0"? In sqlline, when I do _select * from cf_specific_z_index;_, I see the column as _CF1:Host_, but when I _desc 'z_base_table'_ in HBase shell, I see the cf name to be "L#0". 

* *Question 3:* How do we handle the case of multiple local indexes created on the same table? If I run the following:
{code:sql}
CREATE LOCAL INDEX local_z_index1 ON z_base_table(host) TTL=9999,KEEP_DELETED_CELLS='true';
CREATE LOCAL INDEX local_z_index2 ON z_base_table(flag) TTL=8888,KEEP_DELETED_CELLS='false';
{code}
The actual HBase metadata change only reflects the last statement, since both local indexes map to the 'L#0' column family. Please let me know if this is handled at the Phoenix layer and I'm missing something.

> Ensure KEEP_DELETED_CELLS, REPLICATION_SCOPE, and TTL properties stay in sync between the physical data table and index tables
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3955
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3955
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Chinmay Kulkarni
>            Priority: Major
>
> We need to make sure that indexes inherit the REPLICATION_SCOPE, KEEP_DELETED_CELLS and TTL properties from the base table. Otherwise we can run into situations where the data was removed (or not removed) from the data table but was removed (or not removed) from the index. Or vice-versa. We also need to make sure that any ALTER TABLE SET TTL or ALTER TABLE SET KEEP_DELETED_CELLS statements propagate the properties to the indexes too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)