You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Rushabh Shah (Jira)" <ji...@apache.org> on 2020/11/23 18:31:00 UTC

[jira] [Comment Edited] (PHOENIX-6213) Extend Cell Tags to Delete object.

    [ https://issues.apache.org/jira/browse/PHOENIX-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237581#comment-17237581 ] 

Rushabh Shah edited comment on PHOENIX-6213 at 11/23/20, 6:30 PM:
------------------------------------------------------------------

In PR ([https://github.com/apache/phoenix/pull/978]) I have used many of the methods from PrivateCellUtil class which is annotated IA.PRIVATE. I understand that we don't want downstream projects to use private annotated class. But PrivateCellUtil has many powerful apis that I am using for this work. [~gjacoby] pointed out other classes like RawCell and RawCellBuilder classes which are IA.LIMITEDPRIVATE classes but it  will need some additional processing for my use case.

For example: 

1. +PrivateCellUtil#createCell(Cell cell, List<Tag> tags)+ method has an api which will accept an existing Cell and list of tags to create a new cell. 

But RawCellBuilder has a builder method which doesn't have any method which accepts a cell. I need to explicitly convert my input cell by extracting all fields and use the builder methods (like setRow, setsetFamily, etc) and then use the build method.

 

2. +PrivateCellUtil.getTags(Cell cell)+ returns a list of existing tags which I want to use and add a new tag.

But RawCell#getTags() returns Iterator<Tag>  which then I have to iterate over them and depending on whether they are byte buffer backed or array backed, I need to convert them to List since RawCellBuilder#setTags accepts List of Tags. We are already doing this conversion in PrivateCellUtil#getTags method.

 

All these conversion utility methods needs to be duplicated in phoenix project also.

[~apurtell]  [~gjacoby]  To avoid these, do you think it makes sense to mark PrivateCellUtil as  InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC) ? Please advise.

 


was (Author: shahrs87):
In PR (https://github.com/apache/phoenix/pull/978) I have used many of the methods from PrivateCellUtil class which is annotated IA.PRIVATE. I understand that we don't want downstream projects to use private annotated class. But PrivateCellUtil has many powerful apis that I am using for this work. [~gjacoby] pointed out other classes like RawCell and RawCellBuilder classes which are IA.LIMITEDPRIVATE classes but it  will need some additional processing for my use case.

For example: 

1. PrivateCellUtil#createCell(Cell cell, List<Tag> tags) method has an api which will accept an existing Cell and list of tags to create a new cell. 

But RawCellBuilder has a builder method which doesn't have any method which accepts a cell. I need to explicitly convert my input cell by extracting all fields and use the builder methods (like setRow, setsetFamily, etc) and then use the build method.

 

2. PrivateCellUtil.getTags(Cell cell) returns a list of existing tags which I want to use and add a new tag.

But RawCell#getTags() returns Iterator<Tag>  which then I have to iterate over them and depending on whether they are byte buffer backed or array backed, I need to convert them to List since RawCellBuilder#setTags accepts List of Tags. We are already doing this conversion in PrivateCellUtil#getTags method.

 

All these conversion utility methods needs to be duplicated in phoenix project also.

[~apurtell]  [~gjacoby]  To avoid these, do you think it makes sense to mark PrivateCellUtil as  InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC) ? Please advise.

 

> Extend Cell Tags to Delete object.
> ----------------------------------
>
>                 Key: PHOENIX-6213
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6213
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Rushabh Shah
>            Assignee: Rushabh Shah
>            Priority: Major
>
> We want to track the source of mutations (especially Deletes) via Phoenix. We have multiple use cases which does the deletes namely: customer deleting the data, internal process like GDPR compliance, Phoenix TTL MR jobs. For every mutations we want to track the source of operation which initiated the deletes.
> At my day job, we have custom Backup/Restore tool.
> For example: During GDPR compliance cleanup (lets say at time t0), we mistakenly deleted some customer data and it were possible that customer also deleted some data from their side (at time t1). To recover mistakenly deleted data, we restore from the backup at time (t0 - 1). By doing this, we also recovered the data that customer intentionally deleted.
> We need a way for Restore tool to selectively recover data.
> Trying to explain via an example.
> Lets say there are 2 different systems (lets say accidental-delete and customer-delete) deleting the data from the same table at almost the same time. As the name suggest customer-delete is the intentional delete and accidental-delete is deletes done by mistake. We have restore tool which will restore all the data between start time and end times (start-ts and end-ts). We want to restore the deletes that happened by accidental-delete system and not want to restore the deletes done by customer-delete system. By adding cell tag to Delete Markers, we can not restore data done by customer-delete system.
> In my proposal, I want to add cell tags to Tombstone delete marker so that we have that tag in the backups. Incase we have to restore data, we can restore specific row depending on the tag present in the cell.
> We want to leverage Cell Tag feature for Delete mutations to store these metadata. Currently Delete object doesn't support Tag feature.
> Also we want a solution that can be easily extensible to other mutations like Put.
> Some of the use cases I can think of where we can use tags for Put mutations are:
> 1. Identifying whether the put came from primary cluster or replicated cluster so that we can make the backup tool more smarter and not backup the same put twice in source and replicated cluster.
> 2. We have a multi-tenancy concept in Phoenix. We want to track whether the upsert (put operation in hbase) came from Global or Tenant connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)