You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by 秦凯捷 <da...@gmail.com> on 2017/12/12 09:31:55 UTC

Adding Hive Metastore functions to add and alter partitions for multiple tables

Hi dev,

I'm wondering if Hive community have ever considered support adding and
altering multiple partitions for multiple tables?

I'm using Hive Metastore to manage the metadata for Presto querying. Our
business requires that we should publish some partitions of data for
multiple tables at the same time in an atomic transaction to keep the data
consistency. Currently Hive Metastore only supports adding and altering
multiple tables for one table.

I drafted AddPartitionsForTables and AlterPartitionsForTables function to
achieve this based on existing AddPartition and AlterPartition logic and we
are testing it on our system.
I'm wondering if community have considered these functionality. I would
like to contribute the functionality if you have interest.

Thank you!
-Kaijie


Tel: +86-13810485829
E-mail: danielqkj@gmail.com

Re: Adding Hive Metastore functions to add and alter partitions for multiple tables

Posted by Andrew Sherman <as...@cloudera.com>.
Thanks Kaijie.

One concern is that the new functions effectively expand the size of the
transactions and the work that must be undone if they fail. So the question
is whether the benefit is large enough to justify adding complexity.

If no-one else has comments then you should probably think about having
people look at the code.

-Andrew

On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷 <da...@gmail.com> wrote:

> Hi Andrew,
>
> Thanks for you response. For your comments:
>
> -Functionality:
> Support adding and altering multiple partitions for multiple tables in one
> SQL and API request as one transaction.
>
> - what happens in the case of a failure when part way through the
> operations.
> For altering and adding partitions, all the objectstore changes for
> partitions will be operated in one transaction. So the transaction will be
> roll-back in case of failure.
> For adding partitions, there may be additional steps to add directories on
> filesystem for newly added partitions. They will be deleted in case of
> failure, just like what AddPartitions is doing now.
>
> - what impact on the system there will be if an operation takes a long time
> Alter partitions for multiple tables actually has no big difference than
> current altering partitions for one table. They will both take a long time
> if someone is trying to alter too many partitions or for too many tables.
> Transaction timeout will strike down the operation.
> We are doing performance test on our system to see how long it takes for
> multiple scenarios but after all, this should not be a blocker.
>
> Thanks,
> Kaijie
>
> 秦凯捷
> Tel: +86-13810485829
> E-mail: danielqkj@gmail.com
>
>
>
> On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman <as...@cloudera.com>
> wrote:
>
> > Hi Kaijie,
> >
> > I think this is an area that other the Hive community is interested in.
> So
> > please do go ahead and describe your functionality.
> > I think that it is important to describe
> > - what happens in the case of a failure when part way through the
> > operations.
> > - what impact on the system there will be if an operation takes a long
> time
> >
> > Thanks
> >
> > -Andrew
> >
> > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 <da...@gmail.com> wrote:
> >
> > > Hi dev,
> > >
> > > I'm wondering if Hive community have ever considered support adding and
> > > altering multiple partitions for multiple tables?
> > >
> > > I'm using Hive Metastore to manage the metadata for Presto querying.
> Our
> > > business requires that we should publish some partitions of data for
> > > multiple tables at the same time in an atomic transaction to keep the
> > data
> > > consistency. Currently Hive Metastore only supports adding and altering
> > > multiple tables for one table.
> > >
> > > I drafted AddPartitionsForTables and AlterPartitionsForTables function
> to
> > > achieve this based on existing AddPartition and AlterPartition logic
> and
> > we
> > > are testing it on our system.
> > > I'm wondering if community have considered these functionality. I would
> > > like to contribute the functionality if you have interest.
> > >
> > > Thank you!
> > > -Kaijie
> > >
> > >
> > > Tel: +86-13810485829
> > > E-mail: danielqkj@gmail.com
> > >
> >
>

Re: Adding Hive Metastore functions to add and alter partitions for multiple tables

Posted by 秦凯捷 <da...@gmail.com>.
Thank you all for the help. I'm preparing the patch for reviewing.

秦凯捷
Tel: +86-13810485829
E-mail: danielqkj@gmail.com



On Tue, Dec 19, 2017 at 12:49 AM, Eugene Koifman <ek...@hortonworks.com>
wrote:

> +1 to Alex’ comment
>
> On 12/14/17, 3:27 PM, "Alexander Kolbasov" <ak...@cloudera.com> wrote:
>
>     Kaijie,
>
>     can you describe in more details why would you need such functionality?
>     What problem does it actually solve?
>
>     I do not think that HMS should do more "atomic" compound operations
> then it
>     does now - IMO it should do less instead. This is especially the case
> when
>     operations involve a mix of metadata operations and filesystem
> operations
>     which can not be always reverted correctly. Such things make semantics
> of
>     HMS calls more and more complex and difficult to maintain. Existing
> bulk
>     APIs are not a good example that we should follow.
>
>
>     - Alex
>
>     On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷 <da...@gmail.com> wrote:
>
>     > Hi Andrew,
>     >
>     > Thanks for you response. For your comments:
>     >
>     > -Functionality:
>     > Support adding and altering multiple partitions for multiple tables
> in one
>     > SQL and API request as one transaction.
>     >
>     > - what happens in the case of a failure when part way through the
>     > operations.
>     > For altering and adding partitions, all the objectstore changes for
>     > partitions will be operated in one transaction. So the transaction
> will be
>     > roll-back in case of failure.
>     > For adding partitions, there may be additional steps to add
> directories on
>     > filesystem for newly added partitions. They will be deleted in case
> of
>     > failure, just like what AddPartitions is doing now.
>     >
>     > - what impact on the system there will be if an operation takes a
> long time
>     > Alter partitions for multiple tables actually has no big difference
> than
>     > current altering partitions for one table. They will both take a
> long time
>     > if someone is trying to alter too many partitions or for too many
> tables.
>     > Transaction timeout will strike down the operation.
>     > We are doing performance test on our system to see how long it takes
> for
>     > multiple scenarios but after all, this should not be a blocker.
>     >
>     > Thanks,
>     > Kaijie
>     >
>     > 秦凯捷
>     > Tel: +86-13810485829
>     > E-mail: danielqkj@gmail.com
>     >
>     >
>     >
>     > On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman <
> asherman@cloudera.com>
>     > wrote:
>     >
>     > > Hi Kaijie,
>     > >
>     > > I think this is an area that other the Hive community is
> interested in.
>     > So
>     > > please do go ahead and describe your functionality.
>     > > I think that it is important to describe
>     > > - what happens in the case of a failure when part way through the
>     > > operations.
>     > > - what impact on the system there will be if an operation takes a
> long
>     > time
>     > >
>     > > Thanks
>     > >
>     > > -Andrew
>     > >
>     > > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 <da...@gmail.com> wrote:
>     > >
>     > > > Hi dev,
>     > > >
>     > > > I'm wondering if Hive community have ever considered support
> adding and
>     > > > altering multiple partitions for multiple tables?
>     > > >
>     > > > I'm using Hive Metastore to manage the metadata for Presto
> querying.
>     > Our
>     > > > business requires that we should publish some partitions of data
> for
>     > > > multiple tables at the same time in an atomic transaction to
> keep the
>     > > data
>     > > > consistency. Currently Hive Metastore only supports adding and
> altering
>     > > > multiple tables for one table.
>     > > >
>     > > > I drafted AddPartitionsForTables and AlterPartitionsForTables
> function
>     > to
>     > > > achieve this based on existing AddPartition and AlterPartition
> logic
>     > and
>     > > we
>     > > > are testing it on our system.
>     > > > I'm wondering if community have considered these functionality.
> I would
>     > > > like to contribute the functionality if you have interest.
>     > > >
>     > > > Thank you!
>     > > > -Kaijie
>     > > >
>     > > >
>     > > > Tel: +86-13810485829
>     > > > E-mail: danielqkj@gmail.com
>     > > >
>     > >
>     >
>
>
>

Re: Adding Hive Metastore functions to add and alter partitions for multiple tables

Posted by Eugene Koifman <ek...@hortonworks.com>.
+1 to Alex’ comment

On 12/14/17, 3:27 PM, "Alexander Kolbasov" <ak...@cloudera.com> wrote:

    Kaijie,
    
    can you describe in more details why would you need such functionality?
    What problem does it actually solve?
    
    I do not think that HMS should do more "atomic" compound operations then it
    does now - IMO it should do less instead. This is especially the case when
    operations involve a mix of metadata operations and filesystem operations
    which can not be always reverted correctly. Such things make semantics of
    HMS calls more and more complex and difficult to maintain. Existing bulk
    APIs are not a good example that we should follow.
    
    
    - Alex
    
    On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷 <da...@gmail.com> wrote:
    
    > Hi Andrew,
    >
    > Thanks for you response. For your comments:
    >
    > -Functionality:
    > Support adding and altering multiple partitions for multiple tables in one
    > SQL and API request as one transaction.
    >
    > - what happens in the case of a failure when part way through the
    > operations.
    > For altering and adding partitions, all the objectstore changes for
    > partitions will be operated in one transaction. So the transaction will be
    > roll-back in case of failure.
    > For adding partitions, there may be additional steps to add directories on
    > filesystem for newly added partitions. They will be deleted in case of
    > failure, just like what AddPartitions is doing now.
    >
    > - what impact on the system there will be if an operation takes a long time
    > Alter partitions for multiple tables actually has no big difference than
    > current altering partitions for one table. They will both take a long time
    > if someone is trying to alter too many partitions or for too many tables.
    > Transaction timeout will strike down the operation.
    > We are doing performance test on our system to see how long it takes for
    > multiple scenarios but after all, this should not be a blocker.
    >
    > Thanks,
    > Kaijie
    >
    > 秦凯捷
    > Tel: +86-13810485829
    > E-mail: danielqkj@gmail.com
    >
    >
    >
    > On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman <as...@cloudera.com>
    > wrote:
    >
    > > Hi Kaijie,
    > >
    > > I think this is an area that other the Hive community is interested in.
    > So
    > > please do go ahead and describe your functionality.
    > > I think that it is important to describe
    > > - what happens in the case of a failure when part way through the
    > > operations.
    > > - what impact on the system there will be if an operation takes a long
    > time
    > >
    > > Thanks
    > >
    > > -Andrew
    > >
    > > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 <da...@gmail.com> wrote:
    > >
    > > > Hi dev,
    > > >
    > > > I'm wondering if Hive community have ever considered support adding and
    > > > altering multiple partitions for multiple tables?
    > > >
    > > > I'm using Hive Metastore to manage the metadata for Presto querying.
    > Our
    > > > business requires that we should publish some partitions of data for
    > > > multiple tables at the same time in an atomic transaction to keep the
    > > data
    > > > consistency. Currently Hive Metastore only supports adding and altering
    > > > multiple tables for one table.
    > > >
    > > > I drafted AddPartitionsForTables and AlterPartitionsForTables function
    > to
    > > > achieve this based on existing AddPartition and AlterPartition logic
    > and
    > > we
    > > > are testing it on our system.
    > > > I'm wondering if community have considered these functionality. I would
    > > > like to contribute the functionality if you have interest.
    > > >
    > > > Thank you!
    > > > -Kaijie
    > > >
    > > >
    > > > Tel: +86-13810485829
    > > > E-mail: danielqkj@gmail.com
    > > >
    > >
    >
    


Re: Adding Hive Metastore functions to add and alter partitions for multiple tables

Posted by Alexander Kolbasov <ak...@cloudera.com>.
Kaijie,

can you describe in more details why would you need such functionality?
What problem does it actually solve?

I do not think that HMS should do more "atomic" compound operations then it
does now - IMO it should do less instead. This is especially the case when
operations involve a mix of metadata operations and filesystem operations
which can not be always reverted correctly. Such things make semantics of
HMS calls more and more complex and difficult to maintain. Existing bulk
APIs are not a good example that we should follow.


- Alex

On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷 <da...@gmail.com> wrote:

> Hi Andrew,
>
> Thanks for you response. For your comments:
>
> -Functionality:
> Support adding and altering multiple partitions for multiple tables in one
> SQL and API request as one transaction.
>
> - what happens in the case of a failure when part way through the
> operations.
> For altering and adding partitions, all the objectstore changes for
> partitions will be operated in one transaction. So the transaction will be
> roll-back in case of failure.
> For adding partitions, there may be additional steps to add directories on
> filesystem for newly added partitions. They will be deleted in case of
> failure, just like what AddPartitions is doing now.
>
> - what impact on the system there will be if an operation takes a long time
> Alter partitions for multiple tables actually has no big difference than
> current altering partitions for one table. They will both take a long time
> if someone is trying to alter too many partitions or for too many tables.
> Transaction timeout will strike down the operation.
> We are doing performance test on our system to see how long it takes for
> multiple scenarios but after all, this should not be a blocker.
>
> Thanks,
> Kaijie
>
> 秦凯捷
> Tel: +86-13810485829
> E-mail: danielqkj@gmail.com
>
>
>
> On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman <as...@cloudera.com>
> wrote:
>
> > Hi Kaijie,
> >
> > I think this is an area that other the Hive community is interested in.
> So
> > please do go ahead and describe your functionality.
> > I think that it is important to describe
> > - what happens in the case of a failure when part way through the
> > operations.
> > - what impact on the system there will be if an operation takes a long
> time
> >
> > Thanks
> >
> > -Andrew
> >
> > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 <da...@gmail.com> wrote:
> >
> > > Hi dev,
> > >
> > > I'm wondering if Hive community have ever considered support adding and
> > > altering multiple partitions for multiple tables?
> > >
> > > I'm using Hive Metastore to manage the metadata for Presto querying.
> Our
> > > business requires that we should publish some partitions of data for
> > > multiple tables at the same time in an atomic transaction to keep the
> > data
> > > consistency. Currently Hive Metastore only supports adding and altering
> > > multiple tables for one table.
> > >
> > > I drafted AddPartitionsForTables and AlterPartitionsForTables function
> to
> > > achieve this based on existing AddPartition and AlterPartition logic
> and
> > we
> > > are testing it on our system.
> > > I'm wondering if community have considered these functionality. I would
> > > like to contribute the functionality if you have interest.
> > >
> > > Thank you!
> > > -Kaijie
> > >
> > >
> > > Tel: +86-13810485829
> > > E-mail: danielqkj@gmail.com
> > >
> >
>

Re: Adding Hive Metastore functions to add and alter partitions for multiple tables

Posted by 秦凯捷 <da...@gmail.com>.
Hi Andrew,

Thanks for you response. For your comments:

-Functionality:
Support adding and altering multiple partitions for multiple tables in one
SQL and API request as one transaction.

- what happens in the case of a failure when part way through the
operations.
For altering and adding partitions, all the objectstore changes for
partitions will be operated in one transaction. So the transaction will be
roll-back in case of failure.
For adding partitions, there may be additional steps to add directories on
filesystem for newly added partitions. They will be deleted in case of
failure, just like what AddPartitions is doing now.

- what impact on the system there will be if an operation takes a long time
Alter partitions for multiple tables actually has no big difference than
current altering partitions for one table. They will both take a long time
if someone is trying to alter too many partitions or for too many tables.
Transaction timeout will strike down the operation.
We are doing performance test on our system to see how long it takes for
multiple scenarios but after all, this should not be a blocker.

Thanks,
Kaijie

秦凯捷
Tel: +86-13810485829
E-mail: danielqkj@gmail.com



On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman <as...@cloudera.com>
wrote:

> Hi Kaijie,
>
> I think this is an area that other the Hive community is interested in. So
> please do go ahead and describe your functionality.
> I think that it is important to describe
> - what happens in the case of a failure when part way through the
> operations.
> - what impact on the system there will be if an operation takes a long time
>
> Thanks
>
> -Andrew
>
> On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 <da...@gmail.com> wrote:
>
> > Hi dev,
> >
> > I'm wondering if Hive community have ever considered support adding and
> > altering multiple partitions for multiple tables?
> >
> > I'm using Hive Metastore to manage the metadata for Presto querying. Our
> > business requires that we should publish some partitions of data for
> > multiple tables at the same time in an atomic transaction to keep the
> data
> > consistency. Currently Hive Metastore only supports adding and altering
> > multiple tables for one table.
> >
> > I drafted AddPartitionsForTables and AlterPartitionsForTables function to
> > achieve this based on existing AddPartition and AlterPartition logic and
> we
> > are testing it on our system.
> > I'm wondering if community have considered these functionality. I would
> > like to contribute the functionality if you have interest.
> >
> > Thank you!
> > -Kaijie
> >
> >
> > Tel: +86-13810485829
> > E-mail: danielqkj@gmail.com
> >
>

Re: Adding Hive Metastore functions to add and alter partitions for multiple tables

Posted by Andrew Sherman <as...@cloudera.com>.
Hi Kaijie,

I think this is an area that other the Hive community is interested in. So
please do go ahead and describe your functionality.
I think that it is important to describe
- what happens in the case of a failure when part way through the
operations.
- what impact on the system there will be if an operation takes a long time

Thanks

-Andrew

On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 <da...@gmail.com> wrote:

> Hi dev,
>
> I'm wondering if Hive community have ever considered support adding and
> altering multiple partitions for multiple tables?
>
> I'm using Hive Metastore to manage the metadata for Presto querying. Our
> business requires that we should publish some partitions of data for
> multiple tables at the same time in an atomic transaction to keep the data
> consistency. Currently Hive Metastore only supports adding and altering
> multiple tables for one table.
>
> I drafted AddPartitionsForTables and AlterPartitionsForTables function to
> achieve this based on existing AddPartition and AlterPartition logic and we
> are testing it on our system.
> I'm wondering if community have considered these functionality. I would
> like to contribute the functionality if you have interest.
>
> Thank you!
> -Kaijie
>
>
> Tel: +86-13810485829
> E-mail: danielqkj@gmail.com
>