You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by sim <si...@swoop.com> on 2016/02/08 00:29:09 UTC

Scala API: simplifying common patterns

The more Spark code I write, the more I hit the same use cases where the
Scala APIs feel a bit awkward. I'd love to understand if there are
historical reasons for these and whether there is opportunity + interest to
improve the APIs. Here are my top two:
1. registerTempTable() returns Unit
def cachedDF(path: String, tableName: String) = {  val df =
sqlContext.read.load(path).cache()  df.registerTempTable(tableName)  df}//
vs.def cachedDF(path: String, tableName: String) = 
sqlContext.read.load(path).cache().registerTempTable(tableName)
2. No toDF() implicit for creating a DataFrame from an RDD + schema
val schema: StructType = ...val rdd = sc.textFile(...)  .map(...) 
.aggregate(...)val df = sqlContext.createDataFrame(rdd, schema)// vs.val
schema: StructType = ...val df = sc.textFile(...)  .map(...) 
.aggregate(...)  .toDF(schema)
Have you encountered other examples where small, low-risk API tweaks could
make common use cases more consistent + simpler to code?
/Sim



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Scala API: simplifying common patterns

Posted by Reynold Xin <rx...@databricks.com>.
Can you create a pull request? It is difficult to know what's going on.


On Mon, Feb 8, 2016 at 4:51 PM, sim <si...@swoop.com> wrote:

> 24 test failures for sql/test:
> https://gist.github.com/ssimeonov/89862967f87c5c497322
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16247.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Scala API: simplifying common patterns

Posted by sim <si...@swoop.com>.
24 test failures for sql/test:
https://gist.github.com/ssimeonov/89862967f87c5c497322



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16247.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scala API: simplifying common patterns

Posted by Reynold Xin <rx...@databricks.com>.
Yea I'm not sure what's going on either. You can just run the unit tests
through "build/sbt sql/test" without running mima.


On Mon, Feb 8, 2016 at 3:47 PM, sim <si...@swoop.com> wrote:

> Same result with both caches cleared.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16244.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Scala API: simplifying common patterns

Posted by sim <si...@swoop.com>.
Same result with both caches cleared.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16244.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scala API: simplifying common patterns

Posted by Reynold Xin <rx...@databricks.com>.
Not 100% sure what's going on, but you can try wiping your local ivy2 and
maven cache.




On Mon, Feb 8, 2016 at 12:05 PM, sim <si...@swoop.com> wrote:

> Reynold, I just forked + built master and I'm getting lots of binary
> compatibility errors when running the tests.
>
> https://gist.github.com/ssimeonov/69cb0b41750be7777776
>
> Nothing in the dev tools section of the wiki on this. Any advice on how to
> get green before I work on the PRs?
>
> Thanks,
> Sim
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16242.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Scala API: simplifying common patterns

Posted by sim <si...@swoop.com>.
Reynold, I just forked + built master and I'm getting lots of binary
compatibility errors when running the tests. 

https://gist.github.com/ssimeonov/69cb0b41750be7777776

Nothing in the dev tools section of the wiki on this. Any advice on how to
get green before I work on the PRs?

Thanks,
Sim



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16242.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scala API: simplifying common patterns

Posted by sim <si...@swoop.com>.
Sure.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16241.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scala API: simplifying common patterns

Posted by Reynold Xin <rx...@databricks.com>.
Both of these make sense to add. Can you submit a pull request?


On Sun, Feb 7, 2016 at 3:29 PM, sim <si...@swoop.com> wrote:

> The more Spark code I write, the more I hit the same use cases where the
> Scala APIs feel a bit awkward. I'd love to understand if there are
> historical reasons for these and whether there is opportunity + interest to
> improve the APIs. Here are my top two:
> 1. registerTempTable() returns Unit
>
> def cachedDF(path: String, tableName: String) = {
>   val df = sqlContext.read.load(path).cache()
>   df.registerTempTable(tableName)
>   df
> }
>
> // vs.
>
> def cachedDF(path: String, tableName: String) =
>   sqlContext.read.load(path).cache().registerTempTable(tableName)
>
> 2. No toDF() implicit for creating a DataFrame from an RDD + schema
>
> val schema: StructType = ...
> val rdd = sc.textFile(...)
>   .map(...)
>   .aggregate(...)
> val df = sqlContext.createDataFrame(rdd, schema)
>
> // vs.
>
> val schema: StructType = ...
> val df = sc.textFile(...)
>   .map(...)
>   .aggregate(...)
>   .toDF(schema)
>
> Have you encountered other examples where small, low-risk API tweaks could
> make common use cases more consistent + simpler to code?
>
> /Sim
> ------------------------------
> View this message in context: Scala API: simplifying common patterns
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>