You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nemo.apache.org by GitBox <gi...@apache.org> on 2018/06/05 03:44:17 UTC

[GitHub] sanha opened a new pull request #28: [NEMO-12] Frontend support for Scala Spark

sanha opened a new pull request #28: [NEMO-12] Frontend support for Scala Spark
URL: https://github.com/apache/incubator-nemo/pull/28
 
 
   JIRA: [NEMO-12: Frontend support for Scala Spark](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-12)
   
   **Major changes:**
   - Implement Scala side Spark frontend
     - Implement our `RDD` and `PairRDDFunctions` in Scala (due to the "implicit conversion" of Scala)
     - Move major IR DAG structuring implementation from `JavaRDD` / `JavaPairRDD` to `RDD` / `PairRDDFunctions`
       - Because Spark's `JavaRDD` takes `RDD` and we must follow this to extends `JavaRDD` and `JavaPairRDD`
   - Just convey function calls for `JavaRDD` and `JavaPairRDD` to `RDD`
     - Make the main implementation of transformations and actions in `RDD` take Java functions and regard the overridden methods which take Scala functions as wrappers
     - Convert Scala functions to Java functions in here through `SparkFrontendUtils`
   
   **Minor changes to note:**
   - Merge `java` and `scala` pacakage under `compiler.frontend.spark.core`
   - Support collection (not only for Object elements but) for primitive elements in `CollectTransform`
   - Add `SparkWordCount` programmed with Scala RDD
   
   **Tests for the changes:**
   - New integration tests in `SparkScalaITCase` that tests `SparkPi` and `SparkWordCount` programmed with Scala RDD cover these new features
   
   **Other comments:**
   - Pair function calls for our Scala RDD (such as groupByKey) will be automatically conveyed to our Scala PairRDDFunctions through implicit conversion just like Spark, so any extra code modification is not needed for Spark Scala program (except to change Spark’s SparkSession to our’s)
   
   resolves [NEMO-12: Frontend support for Scala Spark](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-12)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services