You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Neil Ramaswamy <ne...@ramaswamy.org> on 2024/03/25 17:03:36 UTC

Improved Structured Streaming Documentation Proof-of-Concept

Hi all,

I recently started an effort to improve the Structured Streaming
documentation. I thought that the current documentation, while very
comprehensive, could be improved in terms of organization, clarity, and
presence of examples.

You can view the repo here
<https://github.com/neilramaswamy/structured-streaming>, and you can see a
preview of the site here <https://structured-streaming.vercel.app/>. It's
almost at full parity with the programming guide, and it also has
additional content, like a guide on unit testing and an in-depth
explanation of watermarks. I think it's at a point where we can bring this
to completion if it's something that the community wants.

I'd love to hear feedback from everyone: is this something that we would
want to move forward with? As it borrows certain parts from the programming
guide, it has an Apache License, so I'd be more than happy if it is adopted
by an official Spark repo.

Best,
Neil

Re: Improved Structured Streaming Documentation Proof-of-Concept

Posted by Neil Ramaswamy <ne...@databricks.com.INVALID>.
I'm glad you think it's generally a good idea!

I will mention, though, that with these better docs I've almost finished,
I'm hoping that Structured Streaming no longer stays a specialist topic
that requires "trench warfare." With good pedagogy, I think that it's very
approachable. The Knowledge Sharing Hub could be useful for e2e real-world
use-cases, but I think that operator semantics, stream configurations, etc.
have a better home in the official documentation.

Thanks for your engagement, Mich. Looking forward to hearing others'
opinions.

Neil

On Mon, Mar 25, 2024 at 2:50 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi,
>
> Your intended work on improving the Structured Streaming documentation is
> great! Clear and well-organized instructions are important for everyone
> using Spark, beginners and experts alike.
> Having said that, Spark Structured Streaming much like other specialist
> topics with Spark say (k8s) or otherwise cannot be mastered by
> documentation alone. These topics require a considerable amount of practice
> and trench warfare so to speak to master them. Suffice to say that I agree
> with the proposals of making examples. However, it is an area that many try
> to master but fail( judging by typical issues brought up in the user group
> and otherwise). Perhaps using a section such as the proposed "Knowledge
> Sharing Hub'', may become more relevant. Moreover, the examples have to
> reflect real life scenarios and conversly will be of limited use otherwise.
>
> HTH
>
> Mich Talebzadeh,
>
> Technologist | Data | Generative AI | Financial Fraud
>
> London
> United Kingdom
>
>
>    view my Linkedin profile
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> Disclaimer: The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner Von Braun)".
>
> Mich Talebzadeh,
> Technologist | Data | Generative AI | Financial Fraud
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Mon, 25 Mar 2024 at 21:19, Neil Ramaswamy <ne...@ramaswamy.org> wrote:
>
>> Hi all,
>>
>> I recently started an effort to improve the Structured Streaming
>> documentation. I thought that the current documentation, while very
>> comprehensive, could be improved in terms of organization, clarity, and
>> presence of examples.
>>
>> You can view the repo here
>> <https://github.com/neilramaswamy/structured-streaming>, and you can see
>> a preview of the site here <https://structured-streaming.vercel.app/>.
>> It's almost at full parity with the programming guide, and it also has
>> additional content, like a guide on unit testing and an in-depth
>> explanation of watermarks. I think it's at a point where we can bring this
>> to completion if it's something that the community wants.
>>
>> I'd love to hear feedback from everyone: is this something that we would
>> want to move forward with? As it borrows certain parts from the programming
>> guide, it has an Apache License, so I'd be more than happy if it is adopted
>> by an official Spark repo.
>>
>> Best,
>> Neil
>>
>

Re: Improved Structured Streaming Documentation Proof-of-Concept

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

Your intended work on improving the Structured Streaming documentation is
great! Clear and well-organized instructions are important for everyone
using Spark, beginners and experts alike.
Having said that, Spark Structured Streaming much like other specialist
topics with Spark say (k8s) or otherwise cannot be mastered by
documentation alone. These topics require a considerable amount of practice
and trench warfare so to speak to master them. Suffice to say that I agree
with the proposals of making examples. However, it is an area that many try
to master but fail( judging by typical issues brought up in the user group
and otherwise). Perhaps using a section such as the proposed "Knowledge
Sharing Hub'', may become more relevant. Moreover, the examples have to
reflect real life scenarios and conversly will be of limited use otherwise.

HTH

Mich Talebzadeh,

Technologist | Data | Generative AI | Financial Fraud

London
United Kingdom


   view my Linkedin profile


 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge
but of course cannot be guaranteed . It is essential to note that, as with
any advice, quote "one test result is worth one-thousand expert opinions
(Werner Von Braun)".

Mich Talebzadeh,
Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Mon, 25 Mar 2024 at 21:19, Neil Ramaswamy <ne...@ramaswamy.org> wrote:

> Hi all,
>
> I recently started an effort to improve the Structured Streaming
> documentation. I thought that the current documentation, while very
> comprehensive, could be improved in terms of organization, clarity, and
> presence of examples.
>
> You can view the repo here
> <https://github.com/neilramaswamy/structured-streaming>, and you can see
> a preview of the site here <https://structured-streaming.vercel.app/>.
> It's almost at full parity with the programming guide, and it also has
> additional content, like a guide on unit testing and an in-depth
> explanation of watermarks. I think it's at a point where we can bring this
> to completion if it's something that the community wants.
>
> I'd love to hear feedback from everyone: is this something that we would
> want to move forward with? As it borrows certain parts from the programming
> guide, it has an Apache License, so I'd be more than happy if it is adopted
> by an official Spark repo.
>
> Best,
> Neil
>