You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sankar Hariappan <sh...@hortonworks.com> on 2018/01/10 06:55:29 UTC

Branch for "Per Table Write ID" implementation

Hi all,

"Hive Replication” feature is advancing to support ACID tables (HIVE-18320<https://issues.apache.org/jira/browse/HIVE-18320>).
“Per Table Write ID” is an important requirement to support replication for ACID tables especially for the use case of “Analytics workload off-loading for scalability”. Details are available in the design document attached in the JIRA.

Per table Write ID implementation have several changes.

  1.  Add metadata tables to allocate and manage write ID. Also, map it against global transaction.
  2.  Handle snapshot isolation for ACID/MM table reads by using ValidWriteIDList instead of ValidTxnList.
  3.  Modify ORC/Hive row readers to use ValidWriteIDList instead of ValidTxnList to read valid delta/base directories.
  4.  Update ValidCompactorTxnList to use table Write Ids.
  5.  Upgrade from existing Hive versions by migrating the ACID/MM tables to use Write ID instead of global transaction ID.
  6.  Correct the UT test scripts to use ValidWriteIDList instead of ValidTxnList for snapshot isolation tests.
  7.  Rename the method/variable names of several classes to use WriteId instead of TxnId.

As part of HIVE-18192<https://issues.apache.org/jira/browse/HIVE-18192>, I have implemented first 3 changes in the list which makes ACID read/write to work with Write ID change. But, this feature will be incomplete without rest of the changes.

Hence, I would like to create a branch (branch-per-table-writeid) from master to commit this feature with multiple patches. This branch is expected to be short-lived for 2 to 3 weeks.

Request feedback from the community.

Best regards
Sankar


Re: Branch for "Per Table Write ID" implementation

Posted by Gopal Vijayaraghavan <go...@apache.org>.
+1

Cheers,
Gopal


On 1/9/18, 10:55 PM, "Sankar Hariappan" <sh...@hortonworks.com> wrote:

    Hi all,
    
    "Hive Replication” feature is advancing to support ACID tables (HIVE-18320<https://issues.apache.org/jira/browse/HIVE-18320>).
    “Per Table Write ID” is an important requirement to support replication for ACID tables especially for the use case of “Analytics workload off-loading for scalability”. Details are available in the design document attached in the JIRA.
    
    Per table Write ID implementation have several changes.
    
      1.  Add metadata tables to allocate and manage write ID. Also, map it against global transaction.
      2.  Handle snapshot isolation for ACID/MM table reads by using ValidWriteIDList instead of ValidTxnList.
      3.  Modify ORC/Hive row readers to use ValidWriteIDList instead of ValidTxnList to read valid delta/base directories.
      4.  Update ValidCompactorTxnList to use table Write Ids.
      5.  Upgrade from existing Hive versions by migrating the ACID/MM tables to use Write ID instead of global transaction ID.
      6.  Correct the UT test scripts to use ValidWriteIDList instead of ValidTxnList for snapshot isolation tests.
      7.  Rename the method/variable names of several classes to use WriteId instead of TxnId.
    
    As part of HIVE-18192<https://issues.apache.org/jira/browse/HIVE-18192>, I have implemented first 3 changes in the list which makes ACID read/write to work with Write ID change. But, this feature will be incomplete without rest of the changes.
    
    Hence, I would like to create a branch (branch-per-table-writeid) from master to commit this feature with multiple patches. This branch is expected to be short-lived for 2 to 3 weeks.
    
    Request feedback from the community.
    
    Best regards
    Sankar
    
    



Re: Branch for "Per Table Write ID" implementation

Posted by Sankar Hariappan <sh...@hortonworks.com>.
Thanks Thejas, Eugene and Gopal for the feedback!
Will go ahead and create the branch!

Best regards
Sankar








On 11/01/18, 11:03 AM, "Eugene Koifman" <ek...@hortonworks.com> wrote:

>+1
>
>
>On 1/10/18, 8:18 PM, "Thejas Nair" <th...@gmail.com> wrote:
>
>    +1
>    Makes sense to split the changes into multiple smaller patches that are
>    easier to review, and creating this branch would help with that.
>    
>    
>    
>    On Tue, Jan 9, 2018 at 10:55 PM, Sankar Hariappan <
>    shariappan@hortonworks.com> wrote:
>    
>    > Hi all,
>    >
>    > "Hive Replication” feature is advancing to support ACID tables (HIVE-18320<
>    > https://issues.apache.org/jira/browse/HIVE-18320>).
>    > “Per Table Write ID” is an important requirement to support replication
>    > for ACID tables especially for the use case of “Analytics workload
>    > off-loading for scalability”. Details are available in the design document
>    > attached in the JIRA.
>    >
>    > Per table Write ID implementation have several changes.
>    >
>    >   1.  Add metadata tables to allocate and manage write ID. Also, map it
>    > against global transaction.
>    >   2.  Handle snapshot isolation for ACID/MM table reads by using
>    > ValidWriteIDList instead of ValidTxnList.
>    >   3.  Modify ORC/Hive row readers to use ValidWriteIDList instead of
>    > ValidTxnList to read valid delta/base directories.
>    >   4.  Update ValidCompactorTxnList to use table Write Ids.
>    >   5.  Upgrade from existing Hive versions by migrating the ACID/MM tables
>    > to use Write ID instead of global transaction ID.
>    >   6.  Correct the UT test scripts to use ValidWriteIDList instead of
>    > ValidTxnList for snapshot isolation tests.
>    >   7.  Rename the method/variable names of several classes to use WriteId
>    > instead of TxnId.
>    >
>    > As part of HIVE-18192<https://issues.apache.org/jira/browse/HIVE-18192>,
>    > I have implemented first 3 changes in the list which makes ACID read/write
>    > to work with Write ID change. But, this feature will be incomplete without
>    > rest of the changes.
>    >
>    > Hence, I would like to create a branch (branch-per-table-writeid) from
>    > master to commit this feature with multiple patches. This branch is
>    > expected to be short-lived for 2 to 3 weeks.
>    >
>    > Request feedback from the community.
>    >
>    > Best regards
>    > Sankar
>    >
>    >
>    
>

Re: Branch for "Per Table Write ID" implementation

Posted by Eugene Koifman <ek...@hortonworks.com>.
+1


On 1/10/18, 8:18 PM, "Thejas Nair" <th...@gmail.com> wrote:

    +1
    Makes sense to split the changes into multiple smaller patches that are
    easier to review, and creating this branch would help with that.
    
    
    
    On Tue, Jan 9, 2018 at 10:55 PM, Sankar Hariappan <
    shariappan@hortonworks.com> wrote:
    
    > Hi all,
    >
    > "Hive Replication” feature is advancing to support ACID tables (HIVE-18320<
    > https://issues.apache.org/jira/browse/HIVE-18320>).
    > “Per Table Write ID” is an important requirement to support replication
    > for ACID tables especially for the use case of “Analytics workload
    > off-loading for scalability”. Details are available in the design document
    > attached in the JIRA.
    >
    > Per table Write ID implementation have several changes.
    >
    >   1.  Add metadata tables to allocate and manage write ID. Also, map it
    > against global transaction.
    >   2.  Handle snapshot isolation for ACID/MM table reads by using
    > ValidWriteIDList instead of ValidTxnList.
    >   3.  Modify ORC/Hive row readers to use ValidWriteIDList instead of
    > ValidTxnList to read valid delta/base directories.
    >   4.  Update ValidCompactorTxnList to use table Write Ids.
    >   5.  Upgrade from existing Hive versions by migrating the ACID/MM tables
    > to use Write ID instead of global transaction ID.
    >   6.  Correct the UT test scripts to use ValidWriteIDList instead of
    > ValidTxnList for snapshot isolation tests.
    >   7.  Rename the method/variable names of several classes to use WriteId
    > instead of TxnId.
    >
    > As part of HIVE-18192<https://issues.apache.org/jira/browse/HIVE-18192>,
    > I have implemented first 3 changes in the list which makes ACID read/write
    > to work with Write ID change. But, this feature will be incomplete without
    > rest of the changes.
    >
    > Hence, I would like to create a branch (branch-per-table-writeid) from
    > master to commit this feature with multiple patches. This branch is
    > expected to be short-lived for 2 to 3 weeks.
    >
    > Request feedback from the community.
    >
    > Best regards
    > Sankar
    >
    >
    


Re: Branch for "Per Table Write ID" implementation

Posted by Thejas Nair <th...@gmail.com>.
+1
Makes sense to split the changes into multiple smaller patches that are
easier to review, and creating this branch would help with that.



On Tue, Jan 9, 2018 at 10:55 PM, Sankar Hariappan <
shariappan@hortonworks.com> wrote:

> Hi all,
>
> "Hive Replication” feature is advancing to support ACID tables (HIVE-18320<
> https://issues.apache.org/jira/browse/HIVE-18320>).
> “Per Table Write ID” is an important requirement to support replication
> for ACID tables especially for the use case of “Analytics workload
> off-loading for scalability”. Details are available in the design document
> attached in the JIRA.
>
> Per table Write ID implementation have several changes.
>
>   1.  Add metadata tables to allocate and manage write ID. Also, map it
> against global transaction.
>   2.  Handle snapshot isolation for ACID/MM table reads by using
> ValidWriteIDList instead of ValidTxnList.
>   3.  Modify ORC/Hive row readers to use ValidWriteIDList instead of
> ValidTxnList to read valid delta/base directories.
>   4.  Update ValidCompactorTxnList to use table Write Ids.
>   5.  Upgrade from existing Hive versions by migrating the ACID/MM tables
> to use Write ID instead of global transaction ID.
>   6.  Correct the UT test scripts to use ValidWriteIDList instead of
> ValidTxnList for snapshot isolation tests.
>   7.  Rename the method/variable names of several classes to use WriteId
> instead of TxnId.
>
> As part of HIVE-18192<https://issues.apache.org/jira/browse/HIVE-18192>,
> I have implemented first 3 changes in the list which makes ACID read/write
> to work with Write ID change. But, this feature will be incomplete without
> rest of the changes.
>
> Hence, I would like to create a branch (branch-per-table-writeid) from
> master to commit this feature with multiple patches. This branch is
> expected to be short-lived for 2 to 3 weeks.
>
> Request feedback from the community.
>
> Best regards
> Sankar
>
>