You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Xinyu Liu <xi...@gmail.com> on 2016/07/19 01:05:14 UTC

Review Request 50174: SAMZA-977: User doc for samza multithreading

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs
-----

  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Chris Pettitt <cp...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review143972
-----------------------------------------------------------


Ship it!




Looks like this covers my comments. However, the documentation reads like it is OK to do synchronous IO by just throwing more threads at the problem. This is not a good idea. The ability to do synchronous IO is a transition step towards async, not an end state.

- Chris Pettitt


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Aug. 30, 2016, 1 a.m., Xinyu Liu wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 368
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455861#file1455861line368>
> >
> >     oh, actually the job.container.single.thread.mode  means using the old runloop which supports single thread execution only. It is default as false so we will use AsyncRunLoop. I plan to get rid of RunLoop in 11.1 once AsyncRunLoop is fully stablized.

If you are planning to remove support for some of the existing features or config (like runloop and single.thread.mode), it is common practice to mark it as deprecated and remove it in the next major release. It will be better to state this fact in the documentation now so that users know that this config is going away.


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147252
-----------------------------------------------------------


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147252
-----------------------------------------------------------




docs/learn/documentation/versioned/jobs/configuration-table.html (line 368)
<https://reviews.apache.org/r/50174/#comment214368>

    oh, actually the job.container.single.thread.mode  means using the old runloop which supports single thread execution only. It is default as false so we will use AsyncRunLoop. I plan to get rid of RunLoop in 11.1 once AsyncRunLoop is fully stablized.


- Xinyu Liu


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.

> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 49
> > <https://reviews.apache.org/r/50174/diff/4/?file=1493027#file1493027line49>
> >
> >     This is a bit confusing. If I read these two comment lines as pseudo code, I would think that the same thread is going to call asynchronous calls and fire callback. Is it the case? I thought that the callback is triggered in a callback thread, not in the main thread invoking the asynchronous calls?

I agree. I modified the sentence to // fire callback upon completion, e.g. invoking callback from asynchronous call completion thread. Does it looks clear to you?


> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > <https://reviews.apache.org/r/50174/diff/4/?file=1493028#file1493028line28>
> >
> >     This section is a bit confusing to me. The description above in "Without..." sounds like that we are implementing read-after-write consistency on global variables. Here we are saying that global variables have to implement the sync method by the user.

Good point. I read it again and it's pretty confusing. I think the latter part is for states within a task, like member variables. I updated the doc based on that.


> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 36
> > <https://reviews.apache.org/r/50174/diff/4/?file=1493028#file1493028line36>
> >
> >     So, this describes the AsyncRunLoop logic. Don't we still have the synchronous RunLoop in SamzaContainer?

Since in default the RunLoop is not used anymore, so I remove its description. Since we are planning to deprecate the runLoop soon, I think we only need to describe the new run loop, right?


> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 50
> > <https://reviews.apache.org/r/50174/diff/4/?file=1493028#file1493028line50>
> >
> >     from *a* different user thread.

Good catch! Thanks!


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148041
-----------------------------------------------------------


On Sept. 13, 2016, 9 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 13, 2016, 9 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148041
-----------------------------------------------------------



Thanks for the detailed doc. I mainly have question regarding to the event-loop.md. We can talk offline if you don't agree/understand my comments.


docs/learn/documentation/versioned/api/overview.md (line 37)
<https://reviews.apache.org/r/50174/#comment215413>

    java NIO ==> Java NIO



docs/learn/documentation/versioned/api/overview.md (line 49)
<https://reviews.apache.org/r/50174/#comment215414>

    This is a bit confusing. If I read these two comment lines as pseudo code, I would think that the same thread is going to call asynchronous calls and fire callback. Is it the case? I thought that the callback is triggered in a callback thread, not in the main thread invoking the asynchronous calls?



docs/learn/documentation/versioned/container/event-loop.md (line 26)
<https://reviews.apache.org/r/50174/#comment215418>

    What do you try to say in the "Without explicit user synchronization..."? Is it by design, we are trying to make sure the global variables implements a read-after-write consistency among tasks, w/ the atomic step being the task operation? P.S. what's exactly a task operation? It is better to define what this is.



docs/learn/documentation/versioned/container/event-loop.md (line 28)
<https://reviews.apache.org/r/50174/#comment215420>

    This section is a bit confusing to me. The description above in "Without..." sounds like that we are implementing read-after-write consistency on global variables. Here we are saying that global variables have to implement the sync method by the user.



docs/learn/documentation/versioned/container/event-loop.md (line 36)
<https://reviews.apache.org/r/50174/#comment215423>

    So, this describes the AsyncRunLoop logic. Don't we still have the synchronous RunLoop in SamzaContainer?



docs/learn/documentation/versioned/container/event-loop.md (line 50)
<https://reviews.apache.org/r/50174/#comment215426>

    from *a* different user thread.


- Yi Pan (Data Infrastructure)


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.

> On Sept. 14, 2016, 12:37 a.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > <https://reviews.apache.org/r/50174/diff/4-5/?file=1493028#file1493028line28>
> >
> >     What about thread-safety among multiple process() operations? I thought that multiple process() can be in multiple threads as well?

Added the tutorial which answers this question there.


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148824
-----------------------------------------------------------


On Sept. 16, 2016, 5:33 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 16, 2016, 5:33 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148824
-----------------------------------------------------------


Fix it, then Ship it!




lgtm overall. Thanks!


docs/learn/documentation/versioned/container/event-loop.md (line 28)
<https://reviews.apache.org/r/50174/#comment216325>

    What about thread-safety among multiple process() operations? I thought that multiple process() can be in multiple threads as well?


- Yi Pan (Data Infrastructure)


On Sept. 13, 2016, 9 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 13, 2016, 9 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.

> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 71
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line71>
> >
> >     The explanation of "processAsync() will always be invoked in a single thread" is off-the-target. I think that you might simply want to focus on "by default, AsyncStreamTask in Samza guarantees in-order process of messages in a task, by making sure the processAsync() can only be called after the callback of previous processAsync() is triggered". Whether processAsync() is invoked in a single thread or not does not directly explain the default in-order guarantee.
> >     
> >     P.S. with this in-order guarantee, doesn't it defeat the whole purpose of async processing? Naturally, there is a question on what performance benefit we gain w/ the default processAsync()?

the perf benefit is that we have paralism among tasks. The processAsync() can be invoked independently among tasks.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 73
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line73>
> >
> >     I felt that the flow works better for me if we organize the explanation in the following order:
> >     - Sync process w/ multi-thread: more parallelism for remote I/O among tasks, in-order execution within a task
> >     - AsyncStreamTask w/ in-order execution in a task (alternative to sync task): default AsyncStreamTask, more parallelism for remote I/O among tasks, in-order execution within a task
> >     - AsyncStreamTask: more parallelism among and within the tasks, out-of-order execution

Thanks for the suggestion! I like this better too and I reordered the doc. The third bullet (paralism within a task) can be applied to both Sync and Async cases. Now it becomes:

- Sync with multi-threading, in-order
- Async, in-order
- Out-of-order for both.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 119
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line119>
> >
> >     "in the order of the message arrivals".
> >     
> >     Also, what do you refer to as "strict ordering of the output" here? In StreamTask w/ multi-threading, don't we guarantee the in-order processing within the task? We can't guarantee the ordering between the tasks anyways.

task.max.concurrency applies to both async and sync case, so we can have multiple process() run in parrallel for a StreamTask.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 125
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line125>
> >
> >     My original question on the doc w/ process() call is: I think that we guarantee that process()/processAsync() and window(), commit() within a single task instance are mutally exclusive to each other. window() and commit() are also exclusive to themselves. But multiple processAsync() calls can be invoked in different threads simultaneously, right? Hence, in a single task instance, the variables used in processAsync() calls need to be thread-safe. Am I interpretting the current implementation right?

Actually it's different: processAsync() will always be invoked from a single thread (for all tasks). The reason being asynchronous calls will not require threads blocking on the calls. The callback can be invoked from multiple threads. Samza controls when to invoke processAsync()/window()/commit() for each task.


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review149262
-----------------------------------------------------------


On Sept. 19, 2016, 5:23 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2016, 5:23 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review149262
-----------------------------------------------------------



Thanks for the detailed write-up! Really helpful document. Please check the comments below and we can discuss in person if you have further questions.


docs/learn/tutorials/versioned/samza-async-user-guide.md (line 71)
<https://reviews.apache.org/r/50174/#comment216831>

    The explanation of "processAsync() will always be invoked in a single thread" is off-the-target. I think that you might simply want to focus on "by default, AsyncStreamTask in Samza guarantees in-order process of messages in a task, by making sure the processAsync() can only be called after the callback of previous processAsync() is triggered". Whether processAsync() is invoked in a single thread or not does not directly explain the default in-order guarantee.
    
    P.S. with this in-order guarantee, doesn't it defeat the whole purpose of async processing? Naturally, there is a question on what performance benefit we gain w/ the default processAsync()?



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 73)
<https://reviews.apache.org/r/50174/#comment216834>

    I felt that the flow works better for me if we organize the explanation in the following order:
    - Sync process w/ multi-thread: more parallelism for remote I/O among tasks, in-order execution within a task
    - AsyncStreamTask w/ in-order execution in a task (alternative to sync task): default AsyncStreamTask, more parallelism for remote I/O among tasks, in-order execution within a task
    - AsyncStreamTask: more parallelism among and within the tasks, out-of-order execution



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 119)
<https://reviews.apache.org/r/50174/#comment216835>

    "in the order of the message arrivals".
    
    Also, what do you refer to as "strict ordering of the output" here? In StreamTask w/ multi-threading, don't we guarantee the in-order processing within the task? We can't guarantee the ordering between the tasks anyways.



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 125)
<https://reviews.apache.org/r/50174/#comment216839>

    My original question on the doc w/ process() call is: I think that we guarantee that process()/processAsync() and window(), commit() within a single task instance are mutally exclusive to each other. window() and commit() are also exclusive to themselves. But multiple processAsync() calls can be invoked in different threads simultaneously, right? Hence, in a single task instance, the variables used in processAsync() calls need to be thread-safe. Am I interpretting the current implementation right?



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 127)
<https://reviews.apache.org/r/50174/#comment216843>

    Isn't this just talking about commit() is mutally exclusive to process/processAsync and window? We can simply state:
    
    - Checkpointing is guaranteed to only cover events that are fully processed. It is persisted in commit() method.


- Yi Pan (Data Infrastructure)


On Sept. 16, 2016, 5:37 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 16, 2016, 5:37 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated Sept. 22, 2016, 7:11 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Changes
-------

Resolve conflicts


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 67e259c9e5bbe100a29d4c3f12d2db7398344f34 
  docs/learn/documentation/versioned/container/event-loop.md 9dcf92ca4f694714fb76157e7a1048bee348f289 
  docs/learn/documentation/versioned/jobs/configuration-table.html 14945e2d65e753c89007703a38af1b4692f8e923 
  docs/learn/tutorials/versioned/index.md ca2b08fb02a83f72f804c4059f258253c046a1b6 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated Sept. 19, 2016, 5:23 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Changes
-------

Updates on Yi's feedback.


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
  docs/learn/tutorials/versioned/index.md b4d687a63638aca4f876af88556de9973acfd718 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated Sept. 16, 2016, 5:37 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
  docs/learn/tutorials/versioned/index.md b4d687a63638aca4f876af88556de9973acfd718 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated Sept. 16, 2016, 5:33 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Changes
-------

Added tutorial according to Navina's feedback.


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
  docs/learn/tutorials/versioned/index.md b4d687a63638aca4f876af88556de9973acfd718 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.

> On Sept. 14, 2016, 12:43 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 44
> > <https://reviews.apache.org/r/50174/diff/5/?file=1497948#file1497948line44>
> >
> >     We need a more concrete example of how to use callback functionality here. It is not very clear. Adding a tutorials or code-example to hello-samza will make it easier for you to complete this documentation :)

Thanks for the feedback. i added the tutorial :).


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148826
-----------------------------------------------------------


On Sept. 16, 2016, 5:33 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 16, 2016, 5:33 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148826
-----------------------------------------------------------


Fix it, then Ship it!




I think I have all my comments called out. So, +1 !


docs/learn/documentation/versioned/api/overview.md (line 44)
<https://reviews.apache.org/r/50174/#comment216327>

    We need a more concrete example of how to use callback functionality here. It is not very clear. Adding a tutorials or code-example to hello-samza will make it easier for you to complete this documentation :)


- Navina Ramesh


On Sept. 13, 2016, 9 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 13, 2016, 9 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated Sept. 13, 2016, 9 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Changes
-------

Update based on Yi's feedback.


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated Sept. 7, 2016, 5:16 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Changes
-------

Update based on Navina's feedback.


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.

> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 49
> > <https://reviews.apache.org/r/50174/diff/3/?file=1488350#file1488350line49>
> >
> >     // fire callback after process complete 
> >     >> Is the user expected to invoke the callback here or is it samza that invokes the callback ? Not clear from the example. If the user has to invoke the callback, you can write the code snippet. 
> >     
> >     In your previous diff, you had this line - "To complete the process, you need to call callback.complete() or callback.failure(). Samza will continue to process next message or shut down the job based on the callback status." . Is that still valid?

oh, it's still there. I moved it to the sentence before the async task example :).


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > <https://reviews.apache.org/r/50174/diff/3/?file=1488351#file1488351line28>
> >
> >     "any state change from one operation will be fully visible to others"?  -> does this mean tasks share state (which in my opinion can be anything from shared variables to KV stores). What state change is made fully visible here? 
> >     I think it is sufficient to say that these operatios within a task partition are mutually exclusive. You can probably remove the state change part.

Chris recommended this: it's actually pretty important for the user to know the visibility rules when using AsyncStreamTask.


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 56
> > <https://reviews.apache.org/r/50174/diff/3/?file=1488351#file1488351line56>
> >
> >     It feels like we are introducing new terms here "serialized mode". I think it is sufficient to simply state, if task.max.concurrency = 1, then each message process completion ...
> >     
> >     Also "it is required to coordinate any shared state" can be re-phrased to "user should synchronize access to any shared/global variables in the Task."

Good suggestions. I remove the serialized mode notion.


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 357
> > <https://reviews.apache.org/r/50174/diff/3/?file=1488352#file1488352line357>
> >
> >     Is it too late to comment on the config key pattern? Traditionally, we don't have any strict conventions for config names listed anywhere. Since we use period as a delimiter to distinguish configs belonging to a namespace, I think we should try to avoid using the same in the actual config part. 
> >     For example, job.container.single.thread.mode can be job.container.single-thread-mode.
> >     Same for job.container.thread.pool.size, task.callback.timeout.ms and task.max.concurrency

This config is pretty awkward (so is the name). It's just for fallback to old runloop if there is any issue with the new AsyncRunLoop. So I don't expect users to set it normally. Once the AsyncRunLoop stablized, I will remove this config as well as the old runLoop.


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147388
-----------------------------------------------------------


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 357
> > <https://reviews.apache.org/r/50174/diff/3/?file=1488352#file1488352line357>
> >
> >     Is it too late to comment on the config key pattern? Traditionally, we don't have any strict conventions for config names listed anywhere. Since we use period as a delimiter to distinguish configs belonging to a namespace, I think we should try to avoid using the same in the actual config part. 
> >     For example, job.container.single.thread.mode can be job.container.single-thread-mode.
> >     Same for job.container.thread.pool.size, task.callback.timeout.ms and task.max.concurrency
> 
> Xinyu Liu wrote:
>     This config is pretty awkward (so is the name). It's just for fallback to old runloop if there is any issue with the new AsyncRunLoop. So I don't expect users to set it normally. Once the AsyncRunLoop stablized, I will remove this config as well as the old runLoop.

As I mentioned above, I think it is important to clarify in the documentation that this is a fallback option to provide compatibility with versions < 0.11.0. Is it too late to change the config name ? :)


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147388
-----------------------------------------------------------


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147388
-----------------------------------------------------------




docs/learn/documentation/versioned/api/overview.md (line 49)
<https://reviews.apache.org/r/50174/#comment214522>

    // fire callback after process complete 
    >> Is the user expected to invoke the callback here or is it samza that invokes the callback ? Not clear from the example. If the user has to invoke the callback, you can write the code snippet. 
    
    In your previous diff, you had this line - "To complete the process, you need to call callback.complete() or callback.failure(). Samza will continue to process next message or shut down the job based on the callback status." . Is that still valid?



docs/learn/documentation/versioned/container/event-loop.md (line 28)
<https://reviews.apache.org/r/50174/#comment214525>

    "any state change from one operation will be fully visible to others"?  -> does this mean tasks share state (which in my opinion can be anything from shared variables to KV stores). What state change is made fully visible here? 
    I think it is sufficient to say that these operatios within a task partition are mutually exclusive. You can probably remove the state change part.



docs/learn/documentation/versioned/container/event-loop.md (line 45)
<https://reviews.apache.org/r/50174/#comment214528>

    Thanks for adding the semantics section. Really clarifies things!



docs/learn/documentation/versioned/container/event-loop.md (line 52)
<https://reviews.apache.org/r/50174/#comment214526>

    "This allows multiple outstanding messages to be processed per task at a time" -> I think it should be "This allows multiple outstanding messages to be processed parallely by the task".



docs/learn/documentation/versioned/container/event-loop.md (line 56)
<https://reviews.apache.org/r/50174/#comment214527>

    It feels like we are introducing new terms here "serialized mode". I think it is sufficient to simply state, if task.max.concurrency = 1, then each message process completion ...
    
    Also "it is required to coordinate any shared state" can be re-phrased to "user should synchronize access to any shared/global variables in the Task."



docs/learn/documentation/versioned/jobs/configuration-table.html (line 357)
<https://reviews.apache.org/r/50174/#comment214529>

    Is it too late to comment on the config key pattern? Traditionally, we don't have any strict conventions for config names listed anywhere. Since we use period as a delimiter to distinguish configs belonging to a namespace, I think we should try to avoid using the same in the actual config part. 
    For example, job.container.single.thread.mode can be job.container.single-thread-mode.
    Same for job.container.thread.pool.size, task.callback.timeout.ms and task.max.concurrency


- Navina Ramesh


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated Aug. 30, 2016, 12:49 a.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Changes
-------

Updates based on Navina's feedback.


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 360
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455861#file1455861line360>
> >
> >     Isn't there a performance impact if we use AsyncStreamTask by default? Esp. on jobs that don't need any async processing. 
> >     Why is the default value false? 
> >     
> >     If users upgrade to the new version and automatically start using multithreaded execution, will they see any performance impact?
> 
> Xinyu Liu wrote:
>     The default of new AsyncRunLoop is still running in a single thread, and the performance is on par with the previous RUnLoop. So the users shouldn't see any perf difference.

Ok. I think the performance results are pretty important. Can you add a link to the performance test results pdf - https://issues.apache.org/jira/secure/attachment/12812335/perf-test-results.pdf, probably appropriate in the event-loop.md where you mention the guarantees that Samza provides.


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
-----------------------------------------------------------


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 49
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455860#file1455860line49>
> >
> >     Does anything change with respect to adding "task.subtask.class" in the process() method of AsyncStreamTask? Please check if this does not change the expected behavior in SAMZA-437. We use that at LinkedIn for restliWrapper (I think).
> 
> Xinyu Liu wrote:
>     This is actually not implemented in open source, only in LinkedIn.

Ah.. looks like we added and then, removed it. The JIRA still explains the pattern that the developer can use. Cool!


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
-----------------------------------------------------------


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 24
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455860#file1455860line24>
> >
> >     There is no sharing of task state in Samza. Each task guarantee isolation from the others. What exactly are you referring to here? Unless I have misunderstood the semantics provided by processing in multiple threads
> 
> Xinyu Liu wrote:
>     The shared state refers to global in-memory states, like singletons and static vars.

Yeah. I kind of figured that's what you meant. But "state" is an very overloaded term. It will be great if you can modify the "sharing task state" to include examples. For example, sharing task state (via global in-memory states, singletons, static vars etc)


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
-----------------------------------------------------------


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.

> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 22
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455859#file1455859line22>
> >
> >     "You should implement StreamTask for synchronous process, e.g. a computation that does not involve remote calls" 
> >     
> >     This is not very clear. To me, it sounds like we are recommending a task interface based on a use-case. Instead, I think it will be useful to explain what you mean by "synchronous processing of a stream message" and compare it with asynchronous processing. This can be followed by the remote call example.

Added the definition there.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 54
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455859#file1455859line54>
> >
> >     You have mentioned "processAsync". Your code sample above show "process". Please fix it.

Thanks for catching it.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 24
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455860#file1455860line24>
> >
> >     There is no sharing of task state in Samza. Each task guarantee isolation from the others. What exactly are you referring to here? Unless I have misunderstood the semantics provided by processing in multiple threads

The shared state refers to global in-memory states, like singletons and static vars.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 43
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455860#file1455860line43>
> >
> >     I think you are trying to explain 2 things:
> >     1. semantics of event loop when using AsyncStreamTask
> >     2. Semantics of event loop when using StreamTask with threadPool size > 1
> >     
> >     Can you try to elaborate more on this? Perhaps just separating semantics of event loop for StreamTask and AsyncStreamTask will add more clarity.
> >     
> >     Also, the processing order guarantees need to be called out. It is possible to process out-of-order within a partition if the task threadpool size is >1. This is a very important behavior change that needs to be documented here.

I ended up by adding section named "Semantics for Synchronous Tasks v.s. Asynchronous Tasks". Please take a look again.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 47
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455860#file1455860line47>
> >
> >     nit: "through the standard InitableTask, ClosableTask, StreamTask / AsyncStreamTask, and WindowTask."

fixed.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 49
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455860#file1455860line49>
> >
> >     Does anything change with respect to adding "task.subtask.class" in the process() method of AsyncStreamTask? Please check if this does not change the expected behavior in SAMZA-437. We use that at LinkedIn for restliWrapper (I think).

This is actually not implemented in open source, only in LinkedIn.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 360
> > <https://reviews.apache.org/r/50174/diff/2/?file=1455861#file1455861line360>
> >
> >     Isn't there a performance impact if we use AsyncStreamTask by default? Esp. on jobs that don't need any async processing. 
> >     Why is the default value false? 
> >     
> >     If users upgrade to the new version and automatically start using multithreaded execution, will they see any performance impact?

The default of new AsyncRunLoop is still running in a single thread, and the performance is on par with the previous RUnLoop. So the users shouldn't see any perf difference.


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
-----------------------------------------------------------


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
-----------------------------------------------------------




docs/learn/documentation/versioned/api/overview.md (line 22)
<https://reviews.apache.org/r/50174/#comment213371>

    "You should implement StreamTask for synchronous process, e.g. a computation that does not involve remote calls" 
    
    This is not very clear. To me, it sounds like we are recommending a task interface based on a use-case. Instead, I think it will be useful to explain what you mean by "synchronous processing of a stream message" and compare it with asynchronous processing. This can be followed by the remote call example.



docs/learn/documentation/versioned/api/overview.md (line 54)
<https://reviews.apache.org/r/50174/#comment213056>

    You have mentioned "processAsync". Your code sample above show "process". Please fix it.



docs/learn/documentation/versioned/container/event-loop.md (line 24)
<https://reviews.apache.org/r/50174/#comment213374>

    There is no sharing of task state in Samza. Each task guarantee isolation from the others. What exactly are you referring to here? Unless I have misunderstood the semantics provided by processing in multiple threads



docs/learn/documentation/versioned/container/event-loop.md (line 41)
<https://reviews.apache.org/r/50174/#comment213382>

    I don't understand when we are blocking. 
    
    Are you saying "Block if (all task instances have outstanding messages to process or (window and commit is in progress)" ? 
    Or "Block if (all task instances have outstanding messages to process or window) and (commit is in progress)?



docs/learn/documentation/versioned/container/event-loop.md (line 43)
<https://reviews.apache.org/r/50174/#comment213385>

    I think you are trying to explain 2 things:
    1. semantics of event loop when using AsyncStreamTask
    2. Semantics of event loop when using StreamTask with threadPool size > 1
    
    Can you try to elaborate more on this? Perhaps just separating semantics of event loop for StreamTask and AsyncStreamTask will add more clarity.
    
    Also, the processing order guarantees need to be called out. It is possible to process out-of-order within a partition if the task threadpool size is >1. This is a very important behavior change that needs to be documented here.



docs/learn/documentation/versioned/container/event-loop.md (line 47)
<https://reviews.apache.org/r/50174/#comment213387>

    nit: "through the standard InitableTask, ClosableTask, StreamTask / AsyncStreamTask, and WindowTask."



docs/learn/documentation/versioned/container/event-loop.md (line 49)
<https://reviews.apache.org/r/50174/#comment213386>

    Does anything change with respect to adding "task.subtask.class" in the process() method of AsyncStreamTask? Please check if this does not change the expected behavior in SAMZA-437. We use that at LinkedIn for restliWrapper (I think).



docs/learn/documentation/versioned/jobs/configuration-table.html (line 360)
<https://reviews.apache.org/r/50174/#comment213388>

    Isn't there a performance impact if we use AsyncStreamTask by default? Esp. on jobs that don't need any async processing. 
    Why is the default value false? 
    
    If users upgrade to the new version and automatically start using multithreaded execution, will they see any performance impact?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 368)
<https://reviews.apache.org/r/50174/#comment213389>

    is this property used only if job.container.single.thread.mode == true ?


- Navina Ramesh


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Aug. 25, 2016, 12:31 a.m., Navina Ramesh wrote:
> > Not that I intend to give you more work. However, adding an example of AsyncStreamTask to samza's hello-world will give a quick idea to the user on when to use it and how to use it. This can be augmented in a tutorial page, similar to the one Jake provided for samza-rest api. Thanks!

Any updates here, xinyu ?


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146739
-----------------------------------------------------------


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146739
-----------------------------------------------------------



Not that I intend to give you more work. However, adding an example of AsyncStreamTask to samza's hello-world will give a quick idea to the user on when to use it and how to use it. This can be augmented in a tutorial page, similar to the one Jake provided for samza-rest api. Thanks!

- Navina Ramesh


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review143847
-----------------------------------------------------------




docs/learn/documentation/versioned/api/overview.md (line 22)
<https://reviews.apache.org/r/50174/#comment209783>

    added example for sync process



docs/learn/documentation/versioned/api/overview.md (line 37)
<https://reviews.apache.org/r/50174/#comment209784>

    added example for async process



docs/learn/documentation/versioned/container/event-loop.md (line 26)
<https://reviews.apache.org/r/50174/#comment209787>

    Add the condition and memeory visibility notes here for running StreamTask in parrallel.



docs/learn/documentation/versioned/container/event-loop.md (line 43)
<https://reviews.apache.org/r/50174/#comment209789>

    Please take a look at this again.


- Xinyu Liu


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
-----------------------------------------------------------

(Updated July 27, 2016, 11:05 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).


Changes
-------

Updates on Chris's feedback.


Repository: samza


Description
-------

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-----

  docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
-------

Test the web pages locally.


Thanks,

Xinyu Liu


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Xinyu Liu <xi...@gmail.com>.

> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 43
> > <https://reviews.apache.org/r/50174/diff/1/?file=1446421#file1446421line43>
> >
> >     s/in the a single thread/in a single thread/.
> >     
> >     A few other minor grammatical errors, but this is not the easy way to share them. If they're not obvious I can send you back a slightly edited version of this paragraph.
> >     
> >     ---
> >     
> >     Larger comment: as we'd ultimately want to move to a single implementation should we not allow process and window to run in parallel for the same task even with multiple threads?

Right, actually for any case process and window will not be run in parallel for the same task. I guess I didn't make it clear in the description. Could you please take a look again at the new diff and send me an updated version on the paragraph with the grammatical fixes? Thanks a lot!


> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 22
> > <https://reviews.apache.org/r/50174/diff/1/?file=1446420#file1446420line22>
> >
> >     Maybe provide an example of what "synchronous process" means. For example, a computation that does not involve remote calls.

Added the examples for both sync process and async process.


> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 26
> > <https://reviews.apache.org/r/50174/diff/1/?file=1446421#file1446421line26>
> >
> >     When would you want to run a synchronous task in parallel? What are the rules (e.g. memory visibility) with such a configuration?

Added the conditions and rules. Please take a look again.


> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > <https://reviews.apache.org/r/50174/diff/1/?file=1446421#file1446421line28>
> >
> >     This doesn't quite sound right. Global state is a problem if there is > 1 concurrency (whether running with multiple samza threads or not). For example, async tasks may or may not be safe depending on concurrency. We also can make stronger guarantees than what is implied by the paragraph (e.g. state from process is fully visible to window and commit).

True, my doc was overly simplified. I added more details. Please take a look.


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review142985
-----------------------------------------------------------


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>


Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

Posted by Chris Pettitt <cp...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review142985
-----------------------------------------------------------




docs/learn/documentation/versioned/api/overview.md (line 22)
<https://reviews.apache.org/r/50174/#comment208638>

    Maybe provide an example of what "synchronous process" means. For example, a computation that does not involve remote calls.



docs/learn/documentation/versioned/container/event-loop.md (line 26)
<https://reviews.apache.org/r/50174/#comment208639>

    When would you want to run a synchronous task in parallel? What are the rules (e.g. memory visibility) with such a configuration?



docs/learn/documentation/versioned/container/event-loop.md (line 28)
<https://reviews.apache.org/r/50174/#comment208640>

    This doesn't quite sound right. Global state is a problem if there is > 1 concurrency (whether running with multiple samza threads or not). For example, async tasks may or may not be safe depending on concurrency. We also can make stronger guarantees than what is implied by the paragraph (e.g. state from process is fully visible to window and commit).



docs/learn/documentation/versioned/container/event-loop.md (line 43)
<https://reviews.apache.org/r/50174/#comment208643>

    s/in the a single thread/in a single thread/.
    
    A few other minor grammatical errors, but this is not the easy way to share them. If they're not obvious I can send you back a slightly edited version of this paragraph.
    
    ---
    
    Larger comment: as we'd ultimately want to move to a single implementation should we not allow process and window to run in parallel for the same task even with multiple threads?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 636)
<https://reviews.apache.org/r/50174/#comment208644>

    s/complete/completes/



docs/learn/documentation/versioned/jobs/configuration-table.html (line 638)
<https://reviews.apache.org/r/50174/#comment208645>

    But also may result in out-of-order processing. Probably good to be explicit about this even if it should be obvious to most people.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 647)
<https://reviews.apache.org/r/50174/#comment208646>

    What happens if a timeout occurs?


- Chris Pettitt


On July 19, 2016, 1:05 a.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated July 19, 2016, 1:05 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/api/overview.md 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>