You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nemo.apache.org by GitBox <gi...@apache.org> on 2018/08/31 14:27:57 UTC

[GitHub] jeongyooneo opened a new pull request #115: [NEMO-96] Modularize DataSkewPolicy to use MetricVertex and BarrierVertex

jeongyooneo opened a new pull request #115: [NEMO-96] Modularize DataSkewPolicy to use MetricVertex and BarrierVertex
URL: https://github.com/apache/incubator-nemo/pull/115
 
 
   JIRA: [NEMO-96: Modularize DataSkewPolicy to use MetricVertex and BarrierVertex](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-96)
   
   **Major changes:**
   - Handle dynamic optimization via `MetricCollectionVertex` and `AggregationBarrierVertex` instead of `MetricCollectionBarrierVertex`
   - For each shuffle edge with main output, `MetricCollectionVertex` is inserted in compile-time at the end of its source tasks, which collects key frequency data
   - For each shuffle edge with main output, `AggregationBarrierVertex` is inserted in compile-time. It aggregates task-level key frequency data, which is collected via each `MetricCollectionVertex` and emitted as additional tagged output 
   
   **Minor changes to note:**
   - Added encoder/decoder factories needed for aggregating dynamic optimization data - in here key frequency data
   - Modified `PipelineTranslator` to extract key encoder/decoders
   - Modified `DataSkewRuntimePass` and related code path to handle `Object` type keys, instead of integer type hash index keys
   
   **Tests for the changes:**
   - N/A(unit tests for skew handling and `PerKeyMedianITCase` test the changes)
   
   **Other comments:**
   - N/A
   
   Closes #GITHUB_PR_NUMBER
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services