You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/26 14:47:31 UTC

[GitHub] [iceberg] zhangpanbigData opened a new issue #2871: using insert overwrite clause in spark throw an exception

zhangpanbigData opened a new issue #2871:
URL: https://github.com/apache/iceberg/issues/2871


   the first step,
   scala> spark.sql("create table hive_catalog.icebergdb.logs (uuid string not null,level string not null,ts timestamp not null,message string) using iceberg partitioned by (level,hours(ts))")
   res10: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("desc hive_catalog.icebergdb.logs").show(false)
   +--------------+---------+-------+
   |col_name      |data_type|comment|
   +--------------+---------+-------+
   |uuid          |string   |       |
   |level         |string   |       |
   |ts            |timestamp|       |
   |message       |string   |       |
   |              |         |       |
   |# Partitioning|         |       |
   |Part 0        |level    |       |
   |Part 1        |hours(ts)|       |
   +--------------+---------+-------+
   
   the second step,
   insert several rows into table logs:
   scala> spark.sql("insert into hive_catalog.icebergdb.logs values('e41764f7-638b-4605-9642-f2b607d77c45','middle',timestamp '2021-07-23 17:24:03','the first row')")
   res14: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("select * from hive_catalog.icebergdb.logs").show(false)
   +------------------------------------+------+-------------------+--------------+
   |uuid                                |level |ts                 |message       |
   +------------------------------------+------+-------------------+--------------+
   |e41764f7-638b-4605-9642-f2b607d77c45|middle|2021-07-23 17:24:03|the first row |
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|senior|2021-07-23 17:24:03|the third row |
   |b742ee76-65d3-4d74-ba74-2c25d3cd4795|low   |2021-07-25 12:43:20|the second row|
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|high  |2021-07-23 00:23:56|the fourth row|
   +------------------------------------+------+-------------------+--------------+
   
   the third step,
   turn on dynamic partition,using insert overwite statement to update the table:
   scala> spark.sql("set spark.sql.sources.partitionOverwriteMode").show(truncate=false)
   +----------------------------------------+-------+
   |key                                     |value  |
   +----------------------------------------+-------+
   |spark.sql.sources.partitionOverwriteMode|dynamic|
   +----------------------------------------+-------+
   
   scala> spark.sql("insert overwrite table hive_catalog.icebergdb.logs select uuid,first(level),first(ts),first(message) from hive_catalog.icebergdb.logs where cast(ts as date)='2021-07-23' group by uuid")
   org.apache.spark.sql.AnalysisException: Cannot write incompatible data to table 'hive_catalog.icebergdb.logs':
   - Cannot write nullable values to non-null column 'level'
   - Cannot write nullable values to non-null column 'ts';
     at org.apache.spark.sql.catalyst.analysis.TableOutputResolver$.resolveOutputColumns(TableOutputResolver.scala:72)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$30.applyOrElse(Analyzer.scala:2891)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$30.applyOrElse(Analyzer.scala:2862)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
     at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:75)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:212)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2862)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2861)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:149)
     at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
     at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
     at scala.collection.immutable.List.foldLeft(List.scala:89)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:146)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:138)
     at scala.collection.immutable.List.foreach(List.scala:392)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:138)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:171)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:165)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:130)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:116)
     at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:116)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:149)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:219)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:148)
     at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68)
     at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:138)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
     at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:138)
     at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:68)
     at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:66)
     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58)
     at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605)
     ... 47 elided
   
   then throw an exception,please tell me what where did i go wrong or this is a bug?
   The above steps are based on official documents,the hive version is 3.1.2,spark version is 3.0.3,iceberg version is the master branch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangpanbigData commented on issue #2871: using insert overwrite clause in spark throw an exception

Posted by GitBox <gi...@apache.org>.
zhangpanbigData commented on issue #2871:
URL: https://github.com/apache/iceberg/issues/2871#issuecomment-887370140


   > I see, I think it's a Spark compatibility issue. first() is a 'nullable' function (it may return null values), but the table column is marked non-nullable. You can try to edit the table schema to have that column nullable.
   
   and static overwrite mode also not avaiable:
   scala> spark.sql("set spark.sql.sources.partitionOverwriteMode").show(truncate=false)
   +----------------------------------------+------+
   |key                                     |value |
   +----------------------------------------+------+
   |spark.sql.sources.partitionOverwriteMode|STATIC|
   +----------------------------------------+------+
   
   scala> spark.sql("select * from hive_catalog.icebergdb.logs").show(false)
   +------------------------------------+------+-------------------+---------------+
   |uuid                                |level |ts                 |message        |
   +------------------------------------+------+-------------------+---------------+
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|sensor|2021-07-24 18:43:29|the fifth mess |
   |b742ee76-65d3-4d74-ba74-2c25d3cd4795|low   |2021-07-25 12:43:20|the third mess |
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|high  |2021-07-23 00:23:56|the fourth mess|
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|sensor|2021-07-23 17:24:03|the second mess|
   |e41764f7-638b-4605-9642-f2b607d77c45|middle|2021-07-23 17:24:03|the first mess |
   +------------------------------------+------+-------------------+---------------+
   
   scala> spark.sql("select uuid,first(level),first(ts),first(message) from hive_catalog.icebergdb.logs where level='sensor' group by uuid").show(false)
   +------------------------------------+------------+-------------------+--------------+
   |uuid                                |first(level)|first(ts)          |first(message)|
   +------------------------------------+------------+-------------------+--------------+
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|sensor      |2021-07-24 18:43:29|the fifth mess|
   +------------------------------------+------------+-------------------+--------------+
   
   
   scala> spark.sql("insert overwrite hive_catalog.icebergdb.logs partition (level='sensor') select uuid,first(level),first(ts),first(message) from hive_catalog.icebergdb.logs where level='sensor' group by uuid").show(false)
   org.apache.spark.sql.AnalysisException: Cannot write to 'hive_catalog.icebergdb.logs', too many data columns:
   Table columns: 'uuid', 'level', 'ts', 'message'
   Data columns: 'uuid', 'level', 'first(level)', 'first(ts)', 'first(message)';
     at org.apache.spark.sql.catalyst.analysis.TableOutputResolver$.resolveOutputColumns(TableOutputResolver.scala:38)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$30.applyOrElse(Analyzer.scala:2879)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$30.applyOrElse(Analyzer.scala:2862)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
     at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:75)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:212)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2862)
     at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2861)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:149)
     at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
     at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
     at scala.collection.immutable.List.foldLeft(List.scala:89)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:146)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:138)
     at scala.collection.immutable.List.foreach(List.scala:392)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:138)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:171)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:165)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:130)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:116)
     at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:116)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:149)
     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:219)
     at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:148)
     at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68)
     at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:138)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
     at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:138)
     at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:68)
     at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:66)
     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58)
     at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605)
     ... 47 elided
   
   it throw exception directly....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #2871: using insert overwrite clause in spark throw an exception

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #2871:
URL: https://github.com/apache/iceberg/issues/2871#issuecomment-887055541


   From first glance of the error message, first() here returns null?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangpanbigData commented on issue #2871: using insert overwrite clause in spark throw an exception

Posted by GitBox <gi...@apache.org>.
zhangpanbigData commented on issue #2871:
URL: https://github.com/apache/iceberg/issues/2871#issuecomment-887356836


   > I see, I think it's a Spark compatibility issue. first() is a 'nullable' function (it may return null values), but the table column is marked non-nullable. You can try to edit the table schema to have that column nullable.
   
   according to your suggestion,it works,but the dynamic partition function seems not affect, removes duplicate does not work,here is my operate steps:
   scala> spark.sql("create table hive_catalog.icebergdb.logs (uuid string,level string,ts timestamp,message string) using iceberg partitioned by (level,hours(ts))").show(false)
   ++
   ||
   ++
   ++
   
   
   scala> spark.sql("desc hive_catalog.icebergdb.logs").show(false)
   +--------------+---------+-------+
   |col_name      |data_type|comment|
   +--------------+---------+-------+
   |uuid          |string   |       |
   |level         |string   |       |
   |ts            |timestamp|       |
   |message       |string   |       |
   |              |         |       |
   |# Partitioning|         |       |
   |Part 0        |level    |       |
   |Part 1        |hours(ts)|       |
   +--------------+---------+-------+
   
   scala> spark.sql("insert into hive_catalog.icebergdb.logs values('e41764f7-638b-4605-9642-f2b607d77c45','middle',timestamp '2021-07-23 17:24:03','the first mess')").show(false)
   ++
   ||
   ++
   ++
   
   
   scala> spark.sql("insert into hive_catalog.icebergdb.logs values('db6a2c2a-93a8-443c-9e41-36cb214648f4','sensor',timestamp '2021-07-23 17:24:03','the second mess')").show(false)
   ++                                                                              
   ||
   ++
   ++
   
   
   scala> spark.sql("insert into hive_catalog.icebergdb.logs values('b742ee76-65d3-4d74-ba74-2c25d3cd4795','low',timestamp '2021-07-25 12:43:20','the third mess')").show(false)
   ++
   ||
   ++
   ++
   
   
   scala> spark.sql("insert into hive_catalog.icebergdb.logs values('db6a2c2a-93a8-443c-9e41-36cb214648f4','high',timestamp '2021-07-23 00:23:56','the fourth mess')").show(false)
   ++
   ||
   ++
   ++
   
   scala> spark.sql("select * from hive_catalog.icebergdb.logs").show(false)
   +------------------------------------+------+-------------------+---------------+
   |uuid                                |level |ts                 |message        |
   +------------------------------------+------+-------------------+---------------+
   |b742ee76-65d3-4d74-ba74-2c25d3cd4795|low   |2021-07-25 12:43:20|the third mess |
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|sensor|2021-07-23 17:24:03|the second mess|
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|high  |2021-07-23 00:23:56|the fourth mess|
   |e41764f7-638b-4605-9642-f2b607d77c45|middle|2021-07-23 17:24:03|the first mess |
   +------------------------------------+------+-------------------+---------------+
   
   
   scala> spark.sql("select uuid,first(level),first(ts),first(message) from hive_catalog.icebergdb.logs where cast(ts as date)='2021-07-23' group by uuid").show(false)
   +------------------------------------+------------+-------------------+---------------+
   |uuid                                |first(level)|first(ts)          |first(message) |
   +------------------------------------+------------+-------------------+---------------+
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|sensor      |2021-07-23 17:24:03|the second mess|
   |e41764f7-638b-4605-9642-f2b607d77c45|middle      |2021-07-23 17:24:03|the first mess |
   +------------------------------------+------------+-------------------+---------------+
   
   
   scala> spark.sql("set spark.sql.sources.partitionOverwriteMode").show(truncate=false)
   +----------------------------------------+-------+
   |key                                     |value  |
   +----------------------------------------+-------+
   |spark.sql.sources.partitionOverwriteMode|dynamic|
   +----------------------------------------+-------+
   
   scala> spark.sql("insert overwrite table hive_catalog.icebergdb.logs select uuid,first(level),first(ts),first(message) from hive_catalog.icebergdb.logs where cast(ts as date)='2021-07-23' group by uuid")
   res17: org.apache.spark.sql.DataFrame = []                                      
   
   scala> spark.sql("select * from hive_catalog.icebergdb.logs").show(false)
   +------------------------------------+------+-------------------+---------------+
   |uuid                                |level |ts                 |message        |
   +------------------------------------+------+-------------------+---------------+
   |b742ee76-65d3-4d74-ba74-2c25d3cd4795|low   |2021-07-25 12:43:20|the third mess |
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|high  |2021-07-23 00:23:56|the fourth mess|
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|sensor|2021-07-23 17:24:03|the second mess|
   |e41764f7-638b-4605-9642-f2b607d77c45|middle|2021-07-23 17:24:03|the first mess |
   +------------------------------------+------+-------------------+---------------+
   
   it seems that there is no change with the table rows,insert overwrite clause not affect?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #2871: using insert overwrite clause in spark throw an exception

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #2871:
URL: https://github.com/apache/iceberg/issues/2871#issuecomment-887312703


   I see, I think it's a Spark compatibility issue.  first() is a 'nullable' function (it may return null values), but the table column is marked non-nullable.  You can try to edit the table schema to have that column nullable.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #2871: using insert overwrite clause in spark throw an exception

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #2871:
URL: https://github.com/apache/iceberg/issues/2871#issuecomment-887312703


   I see, I think it's a Spark compatibility issue.  first() is a 'nullable' function (it may return null values), but the table column is marked non-nullable. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangpanbigData commented on issue #2871: using insert overwrite clause in spark throw an exception

Posted by GitBox <gi...@apache.org>.
zhangpanbigData commented on issue #2871:
URL: https://github.com/apache/iceberg/issues/2871#issuecomment-887133825


   > From first glance of the error message, first() here returns null?
   
   no,the first function works fine,below is the result:
   scala> spark.sql("select uuid,first(level),first(ts),first(message) from hive_catalog.icebergdb.logs where cast(ts as date)='2021-07-23' group by uuid").show(false)
   +------------------------------------+------------+-------------------+--------------+
   |uuid                                |first(level)|first(ts)          |first(message)|
   +------------------------------------+------------+-------------------+--------------+
   |db6a2c2a-93a8-443c-9e41-36cb214648f4|senior      |2021-07-23 17:24:03|the third row |
   |e41764f7-638b-4605-9642-f2b607d77c45|middle      |2021-07-23 17:24:03|the first row |
   +------------------------------------+------------+-------------------+--------------+


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org