You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by lonely Feb <lo...@gmail.com> on 2016/02/23 09:16:45 UTC

spark core api vs. google cloud dataflow

oogle Cloud Dataflow provides distributed dataset which called PCollection,
and syntactic sugar based on PCollection is provided in the form of
"apply". Note that "apply" is different from spark api "map" which passing
each element of the source through a function func. I wonder can spark
support this kind of syntactic sugar, if not, why?

Re: spark core api vs. google cloud dataflow

Posted by Reynold Xin <rx...@databricks.com>.

That's the just transform function in DataFrame

  /**
   * Concise syntax for chaining custom transformations.
   * {{{
   *   def featurize(ds: DataFrame) = ...
   *
   *   df
   *     .transform(featurize)
   *     .transform(...)
   * }}}
   * @since 1.6.0
   */
  def transform[U](t: DataFrame => DataFrame): DataFrame = t(this)


Note that while this is great for chaining, having *only* this leads to
pretty bad user experience, especially in interactive analysis when it is
not obvious what operations are available.



On Tue, Feb 23, 2016 at 12:16 AM, lonely Feb <lo...@gmail.com> wrote:

> oogle Cloud Dataflow provides distributed dataset which called
> PCollection, and syntactic sugar based on PCollection is provided in the
> form of "apply". Note that "apply" is different from spark api "map" which
> passing each element of the source through a function func. I wonder can
> spark support this kind of syntactic sugar, if not, why?
>