You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:43:28 UTC

[GitHub] [iceberg] rdblue commented on a change in pull request #1261: Spark: [DOC] guide about structured streaming sink for Iceberg

rdblue commented on a change in pull request #1261:
URL: https://github.com/apache/iceberg/pull/1261#discussion_r461904586



##########
File path: site/docs/spark.md
##########
@@ -520,6 +520,28 @@ data.writeTo("prod.db.table")
     .createOrReplace()
 ```
 
+### Writing from streaming query (Structured Streaming)
+
+To write values from streaming query to Iceberg table, use `writeStream`:
+
+```scala
+data.writeStream
+    .format("iceberg")
+    .outputMode("append")
+    .option("path", pathToTable)
+    .option("checkpointLocation", checkpointPath)
+    .start()

Review comment:
       This looks specific to 2.4. Should we have a 3.0 example and a separate 2.4 example like the other sections?
   
   An alternative is to create a new page for Spark Streaming and add the docs there. Then we could have a table like the one at the top of the Spark page that explains what is supported in different versions.

##########
File path: site/docs/spark.md
##########
@@ -520,6 +520,28 @@ data.writeTo("prod.db.table")
     .createOrReplace()
 ```
 
+### Writing from streaming query (Structured Streaming)
+
+To write values from streaming query to Iceberg table, use `writeStream`:
+
+```scala
+data.writeStream
+    .format("iceberg")
+    .outputMode("append")
+    .option("path", pathToTable)
+    .option("checkpointLocation", checkpointPath)
+    .start()
+```
+
+`append` and `complete` modes are supported. The table should be created in prior to start the streaming query.
+ 
+!!! Note
+    To avoid metadata growing too huge, there're several guides you may want to follow: 

Review comment:
       I think this is worth a section, not just a note.
   
   > Streaming queries can create new table versions quickly, which creates lots of table metadata to track those versions. Maintaining metadata by tuning the rate of commits, expiring old snapshots, and automatically cleaning up metadata files is highly recommended.
   
   Then you could give an overview of those options and links to further docs, like the table property docs for delete-after-commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org