You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gaurav Agarwal <ga...@gmail.com> on 2023/06/15 09:06:30 UTC
Fwd: iceberg queries
Hi Team,
Sample Merge query:
df.createOrReplaceTempView("source")
MERGE INTO iceberg_hive_cat.iceberg_poc_db.iceberg_tab target
USING (SELECT * FROM source)
ON target.col1 = source.col1// this is my bucket column
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
The source dataset is a temporary view and it contains 1.5 million records
in future can 20 Million rows and with id that have 16 buckets.
The target iceberg table has 16 buckets . The source dataset will only
update if matched and insert if not matched with those id
I have 1700 columns in my table.
spark dataset is using default partitioning , do we need to bucket the
spark dataset on bucket column as well ?
Let me know if you need any further details.
it fails with OOME ,
Regards
Gaurav