You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Julien Phalip <jp...@gmail.com> on 2022/12/20 22:28:24 UTC

Tez hook for "INSERT INTO TABLE PARTITION(...)" query

Hi,

I'm writing a custom Storage Handler and would need to run some custom code
at the end of an INSERT query.

I can easily do that by providing a custom OutputCommitter class and
overriding the commitJob() method. However, that only works for the "mr"
execution engine, as the "commitJob()" method is never called when using
Tez.

With Tez, I managed to get it to work partially by providing a custom
HiveMetaHook class and overriding the commitInsertTable() method. However,
that method only gets called at the end of a "INSERT INTO TABLE" query. It
never gets called at the end of a "INSERT INTO TABLE PARTITION (...)" query.

After doing a bit of troubleshooting, it looks like Tez uses the "DDLTask"
class (which later calls the commitInsertTable() method) only for a "INSERT
INTO TABLE" query. When inserting into a specific partition, the "DDLTask"
class doesn't seem to be used at all.

Is there a way for me to override some type of Tez hook to run custom code
at the end of a "INSERT INTO TABLE PARTITION (...)" query? Maybe by somehow
hooking into the TezTask or TezWork classes?

Any tips would be very welcome.

Thanks!

Julien

Re: Tez hook for "INSERT INTO TABLE PARTITION(...)" query

Posted by Rajesh Balamohan <rb...@apache.org>.
If it is at the end of creating the partition, check whether
"HMS::MetaStoreEventListener::onAddPartition" can be of help.  This may
need customer listener to be added in HMS side.
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java#L122


~Rajesh.B

On Wed, Dec 21, 2022 at 3:58 AM Julien Phalip <jp...@gmail.com> wrote:

> Hi,
>
> I'm writing a custom Storage Handler and would need to run some custom
> code at the end of an INSERT query.
>
> I can easily do that by providing a custom OutputCommitter class and
> overriding the commitJob() method. However, that only works for the "mr"
> execution engine, as the "commitJob()" method is never called when using
> Tez.
>
> With Tez, I managed to get it to work partially by providing a custom
> HiveMetaHook class and overriding the commitInsertTable() method. However,
> that method only gets called at the end of a "INSERT INTO TABLE" query. It
> never gets called at the end of a "INSERT INTO TABLE PARTITION (...)" query.
>
> After doing a bit of troubleshooting, it looks like Tez uses the "DDLTask"
> class (which later calls the commitInsertTable() method) only for a "INSERT
> INTO TABLE" query. When inserting into a specific partition, the "DDLTask"
> class doesn't seem to be used at all.
>
> Is there a way for me to override some type of Tez hook to run custom code
> at the end of a "INSERT INTO TABLE PARTITION (...)" query? Maybe by somehow
> hooking into the TezTask or TezWork classes?
>
> Any tips would be very welcome.
>
> Thanks!
>
> Julien
>