You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/28 09:28:54 UTC

[GitHub] [iceberg] wjxiz1992 opened a new issue #2175: MERGE INTO statement cannot run with Spark SQL

wjxiz1992 opened a new issue #2175:
URL: https://github.com/apache/iceberg/issues/2175


   I'm trying to test the merge into statement in Spark SQL with Iceberg extension:
   
   ```
   $SPARK_HOME/bin/spark-shell --master $MASTER \
   --driver-memory ${DRIVE_MEMORY}G \
   --executor-memory ${EXECUTOR_MEMORY}G \
   --executor-cores $EXECUTOR_CORES \
   --num-executors $NUM_EXECUTOR \
   --conf spark.sql.catalogImplementation=hive \
   --conf spark.task.cpus=1 \
   --conf spark.locality.wait=0 \
   --conf spark.yarn.maxAppAttempts=1 \
   --conf spark.sql.shuffle.partitions=24 \
   --conf spark.sql.files.maxPartitionBytes=128m \
   --conf spark.sql.warehouse.dir=$OUT \
   --conf spark.task.resource.gpu.amount=0.08 \
   --conf spark.executor.resource.gpu.amount=1 \
   --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.0 \
   --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
   --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
   --conf spark.sql.catalog.spark_catalog.type=hive \
   --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
   --conf spark.sql.catalog.local.type=hadoop \
   --conf spark.sql.catalog.local.warehouse=$PWD/warehouse
   
   ```
   
   
   Here's my test:
   
   ```
   scala> spark.sql("""
        | merge into local.db.table1 t
        | using ( select * from local.db.table1_update) s
        | on t.data = s.data
        | when matched then set t.id = t.id + s.id
        | when not matched then insert *
        | """)
   org.apache.spark.sql.catalyst.parser.ParseException:
   mismatched input 'set' expecting {'DELETE', 'UPDATE'}(line 5, pos 18)
   
   == SQL ==
   
   merge into local.db.table1 t
   using ( select * from local.db.table1_update) s
   on t.data = s.data
   when matched then set t.id = t.id + s.id
   ------------------^^^
   when not matched then insert *
   
     at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
     at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133)
     at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
     at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
     at org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parsePlan(IcebergSparkSqlExtensionsParser.scala:100)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:605)
     at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
     ... 53 elided
   
   ```
   
   
   for the environment:
   
   ```
   scala> spark.sparkContext.listJars
   res0: Seq[String] = Vector(spark://10.19.183.124:38945/jars/org.apache.iceberg_iceberg-spark3-runtime-0.11.0.jar)
   ```
   
   not 100% sure if the extension has been loaded correctly because it seems to be a parser issue that it doesn't recognize "set".
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] HeartSaVioR commented on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-768929479


   Iceberg parser extension doesn't add `MERGE INTO`. Spark does have it.
   
   According to Spark parser rule, I guess you missed `UPDATE`, so `UPDATE set t.id = t.id + s.id`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue closed issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

rdblue closed issue #2175:
URL: https://github.com/apache/iceberg/issues/2175


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] HeartSaVioR edited a comment on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

HeartSaVioR edited a comment on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-768929479


   Iceberg parser extension doesn't add `MERGE INTO`. Spark does have it.
   
   According to Spark parser rule, I guess you missed `update` (the error message also mentions the point), so the right sentence would be `then update set t.id = t.id + s.id`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wjxiz1992 commented on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

wjxiz1992 commented on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-768933994


   Thanks a lot for the very quick response! 
   Yes, I missed the "update" keyword here, the query works after I added it.
   Should we update the [doc here](https://iceberg.apache.org/getting-started/#writing) as it's under the section of "Spark"?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wjxiz1992 commented on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

wjxiz1992 commented on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-768950228


   @HeartSaVioR Glad to submit a PR for it, but I couldn't find the corresponding file that contains related content.
   The only file I can find is a MkDoc generated html: https://github.com/apache/iceberg/blob/asf-site/getting-started/index.html#L464 but I don't think I should make any change to that directly. The [getting-started.md](https://github.com/apache/iceberg/blob/064f4f565b892f6bae462ada82033e243d2f2e49/site/docs/getting-started.md#writing) in master branch doesn't contain "merge into".
   could you point me to the right place? 
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] dilipbiswal commented on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

dilipbiswal commented on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-768930127


   Hello,
   
   Here is the grammar  for matched clause and matched action.
   
   matchedClause
       : WHEN MATCHED (AND matchedCond=booleanExpression)? THEN matchedAction
       ;
   
   matchedAction
       : DELETE
       | UPDATE SET ASTERISK
       | UPDATE SET assignmentList
       ;
   
   You are missing the :"UPDATE" keyword ?
   
   Regards
   -- Dilip
   On Thu, Jan 28, 2021 at 1:28 AM wjxiz <no...@github.com> wrote:
   
   > I'm trying to test the merge into statement in Spark SQL with Iceberg
   > extension:
   >
   > $SPARK_HOME/bin/spark-shell --master $MASTER \
   > --driver-memory ${DRIVE_MEMORY}G \
   > --executor-memory ${EXECUTOR_MEMORY}G \
   > --executor-cores $EXECUTOR_CORES \
   > --num-executors $NUM_EXECUTOR \
   > --conf spark.sql.catalogImplementation=hive \
   > --conf spark.task.cpus=1 \
   > --conf spark.locality.wait=0 \
   > --conf spark.yarn.maxAppAttempts=1 \
   > --conf spark.sql.shuffle.partitions=24 \
   > --conf spark.sql.files.maxPartitionBytes=128m \
   > --conf spark.sql.warehouse.dir=$OUT \
   > --conf spark.task.resource.gpu.amount=0.08 \
   > --conf spark.executor.resource.gpu.amount=1 \
   > --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.0 \
   > --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
   > --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
   > --conf spark.sql.catalog.spark_catalog.type=hive \
   > --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
   > --conf spark.sql.catalog.local.type=hadoop \
   > --conf spark.sql.catalog.local.warehouse=$PWD/warehouse
   >
   >
   > Here's my test:
   >
   > scala> spark.sql("""
   >      | merge into local.db.table1 t
   >      | using ( select * from local.db.table1_update) s
   >      | on t.data = s.data
   >      | when matched then set t.id = t.id + s.id
   >      | when not matched then insert *
   >      | """)
   > org.apache.spark.sql.catalyst.parser.ParseException:
   > mismatched input 'set' expecting {'DELETE', 'UPDATE'}(line 5, pos 18)
   >
   > == SQL ==
   >
   > merge into local.db.table1 t
   > using ( select * from local.db.table1_update) s
   > on t.data = s.data
   > when matched then set t.id = t.id + s.id
   > ------------------^^^
   > when not matched then insert *
   >
   >   at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
   >   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133)
   >   at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
   >   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
   >   at org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parsePlan(IcebergSparkSqlExtensionsParser.scala:100)
   >   at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:605)
   >   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
   >   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   >   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
   >   ... 53 elided
   >
   >
   > for the environment:
   >
   > scala> spark.sparkContext.listJars
   > res0: Seq[String] = Vector(spark://10.19.183.124:38945/jars/org.apache.iceberg_iceberg-spark3-runtime-0.11.0.jar)
   >
   > not 100% sure if the extension has been loaded correctly because it seems
   > to be a parser issue that it doesn't recognize "set".
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/2175>, or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ADMQ6BQN365B7PJMLAVYJGTS4EU5RANCNFSM4WWWVB2Q>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wjxiz1992 edited a comment on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

wjxiz1992 edited a comment on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-768950228


   @HeartSaVioR Glad to submit a PR for it, but I couldn't find the corresponding file that contains related content.
   The only file I can find is a MkDoc generated html: https://github.com/apache/iceberg/blob/asf-site/getting-started/index.html#L464 but I don't think I should make any change to that directly. The [getting-started.md](https://github.com/apache/iceberg/blob/064f4f565b892f6bae462ada82033e243d2f2e49/site/docs/getting-started.md#writing) in master branch doesn't contain "merge into".
   could you point me to the right place? 
   Thanks!
   
   
   ------update------:
   I saw where it is: https://github.com/apache/iceberg/pull/2160. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

rdblue commented on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-769240770


   Fixed in the docs. Thanks, everyone for finding the problem!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] HeartSaVioR commented on issue #2175: MERGE INTO statement cannot run with Spark SQL

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on issue #2175:
URL: https://github.com/apache/iceberg/issues/2175#issuecomment-768939074


   Ah yes didn't realize you're trying to follow the doc. Would you mind if I ask to submit a PR for the fix? I'll take it up if you mind. Thanks in advance!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org