You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Julien Phalip <jp...@gmail.com> on 2022/12/20 22:28:24 UTC
Tez hook for "INSERT INTO TABLE PARTITION(...)" query
Hi,
I'm writing a custom Storage Handler and would need to run some custom code
at the end of an INSERT query.
I can easily do that by providing a custom OutputCommitter class and
overriding the commitJob() method. However, that only works for the "mr"
execution engine, as the "commitJob()" method is never called when using
Tez.
With Tez, I managed to get it to work partially by providing a custom
HiveMetaHook class and overriding the commitInsertTable() method. However,
that method only gets called at the end of a "INSERT INTO TABLE" query. It
never gets called at the end of a "INSERT INTO TABLE PARTITION (...)" query.
After doing a bit of troubleshooting, it looks like Tez uses the "DDLTask"
class (which later calls the commitInsertTable() method) only for a "INSERT
INTO TABLE" query. When inserting into a specific partition, the "DDLTask"
class doesn't seem to be used at all.
Is there a way for me to override some type of Tez hook to run custom code
at the end of a "INSERT INTO TABLE PARTITION (...)" query? Maybe by somehow
hooking into the TezTask or TezWork classes?
Any tips would be very welcome.
Thanks!
Julien
Re: Tez hook for "INSERT INTO TABLE PARTITION(...)" query
Posted by Rajesh Balamohan <rb...@apache.org>.
If it is at the end of creating the partition, check whether
"HMS::MetaStoreEventListener::onAddPartition" can be of help. This may
need customer listener to be added in HMS side.
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java#L122
~Rajesh.B
On Wed, Dec 21, 2022 at 3:58 AM Julien Phalip <jp...@gmail.com> wrote:
> Hi,
>
> I'm writing a custom Storage Handler and would need to run some custom
> code at the end of an INSERT query.
>
> I can easily do that by providing a custom OutputCommitter class and
> overriding the commitJob() method. However, that only works for the "mr"
> execution engine, as the "commitJob()" method is never called when using
> Tez.
>
> With Tez, I managed to get it to work partially by providing a custom
> HiveMetaHook class and overriding the commitInsertTable() method. However,
> that method only gets called at the end of a "INSERT INTO TABLE" query. It
> never gets called at the end of a "INSERT INTO TABLE PARTITION (...)" query.
>
> After doing a bit of troubleshooting, it looks like Tez uses the "DDLTask"
> class (which later calls the commitInsertTable() method) only for a "INSERT
> INTO TABLE" query. When inserting into a specific partition, the "DDLTask"
> class doesn't seem to be used at all.
>
> Is there a way for me to override some type of Tez hook to run custom code
> at the end of a "INSERT INTO TABLE PARTITION (...)" query? Maybe by somehow
> hooking into the TezTask or TezWork classes?
>
> Any tips would be very welcome.
>
> Thanks!
>
> Julien
>