You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/02 22:07:11 UTC

[GitHub] [iceberg] kbendick commented on a change in pull request #2779: Spark : Add duplicate file check in add_files

kbendick commented on a change in pull request #2779:
URL: https://github.com/apache/iceberg/pull/2779#discussion_r663265183



##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
##########
@@ -355,9 +366,11 @@ public boolean isDefinedAt(Expression attr) {
    * @param targetTable an Iceberg table where to import the data
    * @param stagingDir a staging directory to store temporary manifest files
    * @param partitionFilter only import partitions whose values match those in the map, can be partially defined
+   * @param checkDuplicateFiles if true, throw exception if import results in a duplicate data file
    */
   public static void importSparkTable(SparkSession spark, TableIdentifier sourceTableIdent, Table targetTable,
-                                      String stagingDir, Map<String, String> partitionFilter) {
+                                      String stagingDir, Map<String, String> partitionFilter,
+                                      boolean checkDuplicateFiles) {

Review comment:
       Instead of changing all of the places where this code is called, would it make more sense to add an additional constructor that calls into this one and nets `checkDuplicateFiles` to a default value so as to not have to change the other code?
   
   Would a default value here be reasonable or no?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org