You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Chinmay Kulkarni <ch...@apache.org> on 2021/07/27 01:17:37 UTC

Supporting Hive batch drop partitions from downstream projects

Hey Hive community,

I am working on using a variant of Trino[1]'s Hive connector to interact
with data in S3. We have a standalone HMS running and wish to support batch
dropping a large number of partitions via Trino. Currently Trino has an
unregister_partition procedure which unfortunately drops partitions 1 by 1
which is proving to be a performance bottleneck for us.

In order to be able to support batch drop partitions from Trino[2], it
seems like we will need to somehow call the
ThriftMetastore.IFace#drop_partitions_req API from Trino's Thrift client
code.

This however, requires passing in a byte[] partition expression and (as per
my understanding) writing an implementation of the PartitionExpressionProxy
interface which has the logic to convert this byte[] to a filter String
that Hive can understand.

My question is, is there a way to do this without making changes to Hive
code/requiring a jar to be in the classpath containing my bespoke
implementation of the above? Are there any examples of standalone HMS
clients being able to integrate batch drop partitions? Any help would be
appreciated.

[1] https://github.com/trinodb/trino
[2] https://github.com/trinodb/trino/issues/7249