You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nemo.apache.org by GitBox <gi...@apache.org> on 2018/10/25 11:24:46 UTC

[GitHub] johnyangk opened a new pull request #129: [NEMO-8] Implement PipeManagerMaster/Worker

johnyangk opened a new pull request #129: [NEMO-8] Implement PipeManagerMaster/Worker
URL: https://github.com/apache/incubator-nemo/pull/129
 
 
   JIRA: [NEMO-8: Implement PipeManagerMaster/Worker](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-8)
   
   **Major changes:**
   - Supports fully-pipelined data streaming for bounded sources (not unbounded sources)
     - Tasks do 'finish' after processing all input data, as the data is finite
     - When a tasks finishes, it emits all data it has (e.g., GroupByKey accumulated results) and closes corresponding outgoing pipes, notifying downstream tasks the end of the pipes
     - For stream-processing unbounded sources, we need watermarks (https://issues.apache.org/jira/browse/NEMO-233)
   - Introduces PipeManagerMaster/Worker
     - Shares code with BlockManagerMaster/Worker 
   - Naive, Element-wise serialization+compression+writeAndFlush
     - Very likely that this will cause some serious overheads, but fixing it is a different issue
   
   **Minor changes to note:**
   - JobConf#SchedulerImplClassName: Batch and Streaming options
   - StreamingPolicyParallelismFive: The default policy + PipeTransferEverythingPass
   - Fixes the StreamingScheduler to pass the new streaming integration tests
   - Fixes a coder bug in the Beam frontend (PCollectionView coder)
   
   **Tests for the changes:**
   - WindowedWordCountITCase#testStreamingFixedWindow
   - WindowedWordCountITCase#testStreamingSlidingWindow
   
   **Other comments:**
   - Also closes "Implement common API for data transfer" (https://issues.apache.org/jira/browse/NEMO-9)
   
   Closes #GITHUB_PR_NUMBER
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services