You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de> on 2021/11/10 07:23:10 UTC

HiveThrift2 ACID Transactions?

Hi all,

We want to use apply INSERTS, UPDATE, and DELETE operations on tables based on parquet or ORC files served by thrift2.
Actually its unclear whether we can enable them and where.

At the moment, when executing UPDATE or DELETE operations those are getting blocked.

Anyone out who uses ACID transactions in combination with thrift2?

Best,
Meikel

Re: HiveThrift2 ACID Transactions?

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
without commenting on any other part of this, note that it was in some hive
commit operations where a race condition in rename surfaced
https://issues.apache.org/jira/browse/HADOOP-16721

if you get odd errors about parent dirs not existing during renames,
that'll be it...Upgrade to Hadoop-3.3.1 binaries to fix


On Thu, 11 Nov 2021 at 12:01, Bode, Meikel, NMA-CFD <
Meikel.Bode@bertelsmann.de> wrote:

> Hi all,
>
>
>
> I now have some more input related to the issues I face at the moment:
>
>
>
> When I try to UPDATE an external table via JDBC connection to HiveThrift2
> server I get the following exception:
>
>
>
> java.lang.UnsupportedOperationException: UPDATE TABLE is not supported
> temporarily.
>
>
>
> Whey doing an DELETE I see:
>
>
>
> org.apache.spark.sql.AnalysisException: DELETE is only supported with v2
> tables.
>
>
>
> INSERT is working as expected.
>
>
>
> We are using Spark 3.1.2 with Hadoop 3.2.0 and an external Hive 3.0.0
> metastore on K8S.
>
> Warehouse dir is located at AWS s3 attached using protocol s3a.
>
>
>
> I learned so far that  that we need to use an ACID compatible file format
> for external tables such as ORC order DELTA.
>
> In addition to that we would need to set some ACID related properties
> either as first commands after session creation or via appropriate
> configuration files:
>
>
>
> SET hive.support.concurrency=true;
>
> SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
>
> SET hive.enforce.sorting=true;
>
> SET hive.enforce.bucketing=true;
>
> SET hive.exec.dynamic.partition.mode=nostrict;
>
> SET hive.compactor.initiator.on=true;
>
> SET hive.compactor.worker.threads=1;
>
>
>
> Now, when I try to create the following table:
>
>
>
> create external table acidtab (id string, val string)
>
>             stored as ORC location '/data/acidtab.orc'
>
>             tblproperties ('transactional'='true');
>
>
>
> I see the following exception:
>
>
>
> org.apache.spark.sql.AnalysisException:
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:The
> table must be stored using an ACID compliant format (such as ORC):
> default.acidtab)
>
>
>
> Even when I try to create the file in ORC format the exception makes the
> suggestion to use ORC as it is required for ACID compliance.
>
>
>
> Another point is that external tables are not getting deleted via DROP
> TABLE command. The only are being removed from the metastore but they
> remain physically available at their s3 bucket.
>
>
>
> I tried with:
>
>
>
> SET `hive.metastore.thrift.delete-files-on-drop`=true;
>
>
>
> And also by setting:
>
>
>
> TBLPROPERTIES ('external.table.purge'='true')
>
>
>
>
>
> Any help on these issues would be very appreciated!
>
>
>
> Many thanks,
>
> Meikel Bode
>
>
>
> *From:* Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
> *Sent:* Mittwoch, 10. November 2021 08:23
> *To:* user <us...@spark.apache.org>; dev <de...@spark.apache.org>
> *Subject:* HiveThrift2 ACID Transactions?
>
>
>
> Hi all,
>
>
>
> We want to use apply INSERTS, UPDATE, and DELETE operations on tables
> based on parquet or ORC files served by thrift2.
>
> Actually its unclear whether we can enable them and where.
>
>
>
> At the moment, when executing UPDATE or DELETE operations those are
> getting blocked.
>
>
>
> Anyone out who uses ACID transactions in combination with thrift2?
>
>
>
> Best,
>
> Meikel
>

RE: HiveThrift2 ACID Transactions?

Posted by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de>.
Hi all,

I now have some more input related to the issues I face at the moment:

When I try to UPDATE an external table via JDBC connection to HiveThrift2 server I get the following exception:

java.lang.UnsupportedOperationException: UPDATE TABLE is not supported temporarily.

Whey doing an DELETE I see:

org.apache.spark.sql.AnalysisException: DELETE is only supported with v2 tables.

INSERT is working as expected.

We are using Spark 3.1.2 with Hadoop 3.2.0 and an external Hive 3.0.0 metastore on K8S.
Warehouse dir is located at AWS s3 attached using protocol s3a.

I learned so far that  that we need to use an ACID compatible file format for external tables such as ORC order DELTA.
In addition to that we would need to set some ACID related properties either as first commands after session creation or via appropriate configuration files:

SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nostrict;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

Now, when I try to create the following table:

create external table acidtab (id string, val string)
            stored as ORC location '/data/acidtab.orc'
            tblproperties ('transactional'='true');

I see the following exception:

org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:The table must be stored using an ACID compliant format (such as ORC): default.acidtab)

Even when I try to create the file in ORC format the exception makes the suggestion to use ORC as it is required for ACID compliance.

Another point is that external tables are not getting deleted via DROP TABLE command. The only are being removed from the metastore but they remain physically available at their s3 bucket.

I tried with:

SET `hive.metastore.thrift.delete-files-on-drop`=true;

And also by setting:

TBLPROPERTIES ('external.table.purge'='true')


Any help on these issues would be very appreciated!

Many thanks,
Meikel Bode

From: Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
Sent: Mittwoch, 10. November 2021 08:23
To: user <us...@spark.apache.org>; dev <de...@spark.apache.org>
Subject: HiveThrift2 ACID Transactions?

Hi all,

We want to use apply INSERTS, UPDATE, and DELETE operations on tables based on parquet or ORC files served by thrift2.
Actually its unclear whether we can enable them and where.

At the moment, when executing UPDATE or DELETE operations those are getting blocked.

Anyone out who uses ACID transactions in combination with thrift2?

Best,
Meikel

RE: HiveThrift2 ACID Transactions?

Posted by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de>.
Hi all,

I now have some more input related to the issues I face at the moment:

When I try to UPDATE an external table via JDBC connection to HiveThrift2 server I get the following exception:

java.lang.UnsupportedOperationException: UPDATE TABLE is not supported temporarily.

Whey doing an DELETE I see:

org.apache.spark.sql.AnalysisException: DELETE is only supported with v2 tables.

INSERT is working as expected.

We are using Spark 3.1.2 with Hadoop 3.2.0 and an external Hive 3.0.0 metastore on K8S.
Warehouse dir is located at AWS s3 attached using protocol s3a.

I learned so far that  that we need to use an ACID compatible file format for external tables such as ORC order DELTA.
In addition to that we would need to set some ACID related properties either as first commands after session creation or via appropriate configuration files:

SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nostrict;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

Now, when I try to create the following table:

create external table acidtab (id string, val string)
            stored as ORC location '/data/acidtab.orc'
            tblproperties ('transactional'='true');

I see the following exception:

org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:The table must be stored using an ACID compliant format (such as ORC): default.acidtab)

Even when I try to create the file in ORC format the exception makes the suggestion to use ORC as it is required for ACID compliance.

Another point is that external tables are not getting deleted via DROP TABLE command. The only are being removed from the metastore but they remain physically available at their s3 bucket.

I tried with:

SET `hive.metastore.thrift.delete-files-on-drop`=true;

And also by setting:

TBLPROPERTIES ('external.table.purge'='true')


Any help on these issues would be very appreciated!

Many thanks,
Meikel Bode

From: Bode, Meikel, NMA-CFD <Me...@Bertelsmann.de>
Sent: Mittwoch, 10. November 2021 08:23
To: user <us...@spark.apache.org>; dev <de...@spark.apache.org>
Subject: HiveThrift2 ACID Transactions?

Hi all,

We want to use apply INSERTS, UPDATE, and DELETE operations on tables based on parquet or ORC files served by thrift2.
Actually its unclear whether we can enable them and where.

At the moment, when executing UPDATE or DELETE operations those are getting blocked.

Anyone out who uses ACID transactions in combination with thrift2?

Best,
Meikel