You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2018/10/27 19:28:29 UTC

structured streaming bookkeeping formats

i was reading this blog post from last year about structured streaming
run-once trigger:
https://databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html

its a nice idea to replace a batch job with structured streaming because it
does the bookkeeping (whats new, failure recovery, etc.) for you.

but that's also the part that scares me a bit. when its all done for me and
it breaks anyhow i am not sure i know how to recover. and i am unsure how
to upgrade.
so... are the formats that spark structured streaming uses for
"bookkeeping" easily readable (like say json) and stable? does it consist
of files i can go look at and  understand and edit/manipulate myself if
needed? are there are references to the format used?

thank you!

best,
koert