You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/01 03:04:09 UTC

[GitHub] [iceberg] jingli430 opened a new issue, #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1

jingli430 opened a new issue, #5174:
URL: https://github.com/apache/iceberg/issues/5174

   Hi, I've seen same issues reported many times here. And I know I should first check three things. 
   **[1] is spark version, iceberg version and scala version correct?** 
   yes, I am using Spark3.2.0, Scala2.12, Iceberg0.13.1. And it is running well when I execute some other statements like insert, create, etc. 
   **[2] did I include spark.sql.extensions?**
   yes, below is my spark-sql config
   `spark-sql --master yarn --deploy-mode client --name sparksql_iceberg_session \
   --driver-cores 1 --driver-memory 1g --executor-cores 2 --executor-memory 16g --num-executors 2 \
   --conf spark.memory.fraction=0.8 --conf spark.storage.memoryFraction=0.3 \
   --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \
   --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
   --conf spark.sql.catalog.hive_prod=org.apache.iceberg.spark.SparkCatalog \
   --conf spark.sql.catalog.hive_prod.type=hive \
   --conf spark.sql.catalog.hive_prod.uri=thrift://emr-header-1.cluster:9083 \
   --conf spark.sql.catalog.hive_prod.warehouse=oss://xp-warehouse-iceberg/ \
   --conf spark.sql.catalog.hive_prod.access.key.id=xxx \
   --conf spark.sql.catalog.hive_prod.access.key.secret=yyy \
   --conf spark.sql.catalog.hive_prod.oss.endpoint=oss-cn-shanghai-internal.aliyuncs.com \
   --conf spark.sql.catalog.session_prod=org.apache.iceberg.spark.SparkSessionCatalog \
   --conf spark.sql.catalog.session_prod.type=hive --conf spark.sql.defaultCatalog=hive_prod`
   **[3] is target iceberg table?**
   yes, But I still saw this error when doing merge into command below at my spark-sql shell.
   
   `CREATE TABLE xiceberg_dev.ib_mergeinto_target_upsert (
     order_id int,
     order_ts timestamp,
     order_date date,
     price int,
     order_status boolean,
     PRIMARY KEY (order_id) NOT ENFORCED
   )
   WITH (
     'format-version'= '2',
     'write.upsert.enable'='true',
     'write.distribution-mode'='hash',
     'write.metadata.delete-after-commit.enabled'='true',
     'write.metadata.previous-versions-max'='9'
   );
   
   MERGE INTO hive_prod.xiceberg_dev.ib_mergeinto_target_upsert as target
   USING hive_prod.xiceberg_dev.ib_mergeinto_source_insert_pt as source
   ON target.order_id = source.order_id
   WHEN MATCHED THEN UPDATE SET *
   WHEN NOT MATCHED THEN INSERT *
   ;`
   
   Could someone help me here?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] naseerscorpio commented on issue #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1

Posted by GitBox <gi...@apache.org>.
naseerscorpio commented on issue #5174:
URL: https://github.com/apache/iceberg/issues/5174#issuecomment-1185582508

   This was due to wrong property I had above: `sparl.sql.extensions`. Correcting that fixed the issue and merge works perfectly. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] naseerscorpio commented on issue #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1

Posted by GitBox <gi...@apache.org>.
naseerscorpio commented on issue #5174:
URL: https://github.com/apache/iceberg/issues/5174#issuecomment-1183275972

   Hi, I'm also facing this issue when using iceberg with AWS Glue integration. Here is my classpath and code to reproduce the issue. Strangely, same queries work when run from `spark-shell`
   
   **Classpath**
   
   ```scala
   val sparkVersion   = "3.2.1"
   val icebergVersion = "0.13.2"
   val awsSDKVersion = "2.17.131"
   
   libraryDependencies := Seq(
     "org.apache.spark" %% "spark-core" % sparkVersion,
     "org.apache.spark" %% "spark-sql" % sparkVersion,
     "org.apache.iceberg"     % "iceberg-spark-runtime-3.2_2.12" % icebergVersion,
     "software.amazon.awssdk" % "url-connection-client"          % awsSDKVersion,
     "software.amazon.awssdk" % "bundle"          % awsSDKVersion,
     "org.apache.hadoop" % "hadoop-common" % "3.3.1",
     "org.apache.hadoop"      % "hadoop-aws"  % "3.3.1",
     "com.typesafe" % "config" % "1.3.3",
     "com.github.scopt" %% "scopt" % "3.7.0"
   )
   ```
   
   ```scala
   object IcebergApp extends App {
       val conf = new SparkConf()
         conf.setMaster("local[*]")
         conf.setAppName("iceberg-demo")
         conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
         conf.set("spark.kryoserializer.buffer.max", "256m")
         conf.set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkCatalog")
         conf.set("sparl.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
         conf.set("spark.sql.catalog.spark_catalog.warehouse", "s3://some-bucket/test_iceberg/")
         conf.set("spark.sql.catalog.spark_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
         conf.set("spark.sql.catalog.spark_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
         conf.set("spark.sql.iceberg.handle-timestamp-without-timezone", "true")
       
         val sparkSession = SparkSession.builder().config(conf).getOrCreate()
       
         import org.apache.spark.sql.DataFrame
       
         def readInputDF() = {
           import org.apache.spark.sql.types._
           val sparkSchema = StructType(
             Array(
               StructField("id",LongType,true),
               StructField("dep",StringType,true),
               StructField("created_ts",TimestampType,true)
             ))
           val input_df =  sparkSession.read
             .schema(sparkSchema)
             .option("inferSchema", "false")
             .option("mode", "PERMISSIVE")
             .format("json")
             .load("employees_part1.json")
       
           input_df.sortWithinPartitions("created_ts")
             .createOrReplaceTempView("input_df")
         }
       
       
         def doMerge() = {
           sparkSession.sql("MERGE INTO test_iceberg.employees t USING (SELECT * FROM input_df) s ON t.id = s.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *")
         }
       
       
         private def createTableIfNotExists(): DataFrame = {
           sparkSession.sql(s"""
                               |CREATE OR REPLACE TABLE test_iceberg.employees (
                               |  id bigint,
                               |dep string,
                               |created_ts timestamp
                               |)
                               |USING ICEBERG
                               |PARTITIONED BY (days(created_ts))
                               |LOCATION "s3://some-bucket/test_iceberg/employees/"
                               |TBLPROPERTIES (
                               |  'write.distribution-mode'='hash',
                               |  'write.metadata.delete-after-commit.enabled'='true',
                               |  'write.metadata.previous-versions-max'='9'
                               |)
                               |""".stripMargin)
         }
       
         createTableIfNotExists()
         readInputDF()
         doMerge()
   }
   ```
   
   **Error**
   ```
   Exception in thread "main" java.lang.UnsupportedOperationException: MERGE INTO TABLE is not supported temporarily.
   ```
   
   Could you advise if am missing something here ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] KarlManong commented on issue #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1

Posted by GitBox <gi...@apache.org>.
KarlManong commented on issue #5174:
URL: https://github.com/apache/iceberg/issues/5174#issuecomment-1179659723

   see https://github.com/apache/iceberg/issues/2737
   
   I think one of your table is not iceberg format.
   
   You can execute `show create table`.
   
   `org.apache.iceberg.spark.extensions.TestMerge` may be help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1

Posted by github-actions.
github-actions[bot] closed issue #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1
URL: https://github.com/apache/iceberg/issues/5174


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #5174:
URL: https://github.com/apache/iceberg/issues/5174#issuecomment-1379643412

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5174: MERGE INTO TABLE is not supported temporarily on Spark3.2.0, Scala2.12, Iceberg0.13.1

Posted by github-actions.
github-actions[bot] commented on issue #5174:
URL: https://github.com/apache/iceberg/issues/5174#issuecomment-1404386608

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org