You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/12/05 18:26:35 UTC

[GitHub] rdblue commented on a change in pull request #6: Support customizing the location where data is written in Spark

rdblue commented on a change in pull request #6: Support customizing the location where data is written in Spark
URL: https://github.com/apache/incubator-iceberg/pull/6#discussion_r239182889
 
 

 ##########
 File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java
 ##########
 @@ -89,7 +92,11 @@ public DataSourceReader createReader(DataSourceOptions options) {
           .toUpperCase(Locale.ENGLISH));
     }
 
-    return Optional.of(new Writer(table, lazyConf(), format));
+    String dataLocation = options.get(TableProperties.WRITE_NEW_DATA_LOCATION)
+        .orElse(table.properties().getOrDefault(
+            TableProperties.WRITE_NEW_DATA_LOCATION,
+            new Path(new Path(table.location()), "data").toString()));
+    return Optional.of(new Writer(table, lazyConf(), format, dataLocation));
 
 Review comment:
   What I think is strange is passing the location of a write into the writer when we're passing table into the writer. Why isn't that logic entirely handled in the writer? The normal case is for the write location to come from table config. I'm not even sure that we should allow overriding the write location in Spark's write properties. What is the use case there?
   
   I like your reasoning about not passing options as a map to make testing clear in general, but doing it here just shifts the concern to a different test. The test case is that setting "write.folder-storage.path" in Spark options changes the location of output files. A test that passes in the location can validate that the location is respected, but what we actually want to do is test that the table's location defaults, or is set by the table property, or (maybe) is set by Spark options.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services