You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jacek Laskowski <ja...@japila.pl> on 2016/07/05 13:22:56 UTC

Why's ds.foreachPartition(println) not possible?

Hi,

It's with the master built today. Why can't I call
ds.foreachPartition(println)? Is using type annotation the only way to
go forward? I'd be so sad if that's the case.

scala> ds.foreachPartition(println)
<console>:28: error: overloaded method value foreachPartition with alternatives:
  (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
<and>
  (f: Iterator[Record] => Unit)Unit
 cannot be applied to (Unit)
       ds.foreachPartition(println)
          ^

scala> sc.version
res9: String = 2.0.0-SNAPSHOT

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Jacek Laskowski <ja...@japila.pl>.
Thanks Cody, Reynold, and Ryan! Learnt a lot and feel "corrected".

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Wed, Jul 6, 2016 at 2:46 AM, Shixiong(Ryan) Zhu
<sh...@databricks.com> wrote:
> I asked this question in Scala user group two years ago:
> https://groups.google.com/forum/#!topic/scala-user/W4f0d8xK1nk
>
> Take a look if you are interested in.
>
> On Tue, Jul 5, 2016 at 1:31 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>> You can file it here: https://issues.scala-lang.org/secure/Dashboard.jspa
>>
>> Perhaps "bug" is not the right word, but "limitation". println accepts a
>> single argument of type Any and returns Unit, and it appears that Scala
>> fails to infer the correct overloaded method in this case.
>>
>>   def println() = Console.println()
>>   def println(x: Any) = Console.println(x)
>>
>>
>>
>> On Tue, Jul 5, 2016 at 1:27 PM, Cody Koeninger <co...@koeninger.org> wrote:
>>>
>>> I don't think that's a scala compiler bug.
>>>
>>> println is a valid expression that returns unit.
>>>
>>> Unit is not a single-argument function, and does not match any of the
>>> overloads of foreachPartition
>>>
>>> You may be used to a conversion taking place when println is passed to
>>> method expecting a function, but that's not a safe thing to do
>>> silently for multiple overloads.
>>>
>>> tldr;
>>>
>>> just use
>>>
>>> ds.foreachPartition(x => println(x))
>>>
>>> you don't need any type annotations
>>>
>>>
>>> On Tue, Jul 5, 2016 at 2:53 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>> > Hi Reynold,
>>> >
>>> > Is this already reported and tracked somewhere. I'm quite sure that
>>> > people will be asking about the reasons Spark does this. Where are
>>> > such issues reported usually?
>>> >
>>> > Pozdrawiam,
>>> > Jacek Laskowski
>>> > ----
>>> > https://medium.com/@jaceklaskowski/
>>> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> > Follow me at https://twitter.com/jaceklaskowski
>>> >
>>> >
>>> > On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin <rx...@databricks.com>
>>> > wrote:
>>> >> This seems like a Scala compiler bug.
>>> >>
>>> >>
>>> >> On Tuesday, July 5, 2016, Jacek Laskowski <ja...@japila.pl> wrote:
>>> >>>
>>> >>> Well, there is foreach for Java and another foreach for Scala. That's
>>> >>> what I can understand. But while supporting two language-specific
>>> >>> APIs
>>> >>> -- Scala and Java -- Dataset API lost support for such simple calls
>>> >>> without type annotations so you have to be explicit about the variant
>>> >>> (since I'm using Scala I want to use Scala API right). It appears
>>> >>> that
>>> >>> any single-argument-function operators in Datasets are affected :(
>>> >>>
>>> >>> My question was to know whether there are works to fix it (if
>>> >>> possible
>>> >>> -- I don't know if it is).
>>> >>>
>>> >>> Pozdrawiam,
>>> >>> Jacek Laskowski
>>> >>> ----
>>> >>> https://medium.com/@jaceklaskowski/
>>> >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> Follow me at https://twitter.com/jaceklaskowski
>>> >>>
>>> >>>
>>> >>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>> > Right, should have noticed that in your second mail. But foreach
>>> >>> > already does what you want, right? it would be identical here.
>>> >>> >
>>> >>> > How these two methods do conceptually different things on different
>>> >>> > arguments. I don't think I'd expect them to accept the same
>>> >>> > functions.
>>> >>> >
>>> >>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <ja...@japila.pl>
>>> >>> > wrote:
>>> >>> >> ds is Dataset and the problem is that println (or any other
>>> >>> >> one-element function) would not work here (and perhaps other
>>> >>> >> methods
>>> >>> >> with two variants - Java's and Scala's).
>>> >>> >>
>>> >>> >> Pozdrawiam,
>>> >>> >> Jacek Laskowski
>>> >>> >> ----
>>> >>> >> https://medium.com/@jaceklaskowski/
>>> >>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> >> Follow me at https://twitter.com/jaceklaskowski
>>> >>> >>
>>> >>> >>
>>> >>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com>
>>> >>> >> wrote:
>>> >>> >>> A DStream is a sequence of RDDs, not of elements. I don't think
>>> >>> >>> I'd
>>> >>> >>> expect to express an operation on a DStream as if it were
>>> >>> >>> elements.
>>> >>> >>>
>>> >>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl>
>>> >>> >>> wrote:
>>> >>> >>>> Sort of. Your example works, but could you do a mere
>>> >>> >>>> ds.foreachPartition(println)? Why not? What should I even see
>>> >>> >>>> the
>>> >>> >>>> Java
>>> >>> >>>> version?
>>> >>> >>>>
>>> >>> >>>> scala> val ds = spark.range(10)
>>> >>> >>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>> >>> >>>>
>>> >>> >>>> scala> ds.foreachPartition(println)
>>> >>> >>>> <console>:26: error: overloaded method value foreachPartition
>>> >>> >>>> with
>>> >>> >>>> alternatives:
>>> >>> >>>>   (func:
>>> >>> >>>>
>>> >>> >>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>> >>> >>>> <and>
>>> >>> >>>>   (f: Iterator[Long] => Unit)Unit
>>> >>> >>>>  cannot be applied to (Unit)
>>> >>> >>>>        ds.foreachPartition(println)
>>> >>> >>>>           ^
>>> >>> >>>>
>>> >>> >>>> Pozdrawiam,
>>> >>> >>>> Jacek Laskowski
>>> >>> >>>> ----
>>> >>> >>>> https://medium.com/@jaceklaskowski/
>>> >>> >>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> >>>> Follow me at https://twitter.com/jaceklaskowski
>>> >>> >>>>
>>> >>> >>>>
>>> >>> >>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com>
>>> >>> >>>> wrote:
>>> >>> >>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or
>>> >>> >>>>> similar?
>>> >>> >>>>>
>>> >>> >>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski
>>> >>> >>>>> <ja...@japila.pl>
>>> >>> >>>>> wrote:
>>> >>> >>>>>> Hi,
>>> >>> >>>>>>
>>> >>> >>>>>> It's with the master built today. Why can't I call
>>> >>> >>>>>> ds.foreachPartition(println)? Is using type annotation the
>>> >>> >>>>>> only way
>>> >>> >>>>>> to
>>> >>> >>>>>> go forward? I'd be so sad if that's the case.
>>> >>> >>>>>>
>>> >>> >>>>>> scala> ds.foreachPartition(println)
>>> >>> >>>>>> <console>:28: error: overloaded method value foreachPartition
>>> >>> >>>>>> with
>>> >>> >>>>>> alternatives:
>>> >>> >>>>>>   (func:
>>> >>> >>>>>>
>>> >>> >>>>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>> >>> >>>>>> <and>
>>> >>> >>>>>>   (f: Iterator[Record] => Unit)Unit
>>> >>> >>>>>>  cannot be applied to (Unit)
>>> >>> >>>>>>        ds.foreachPartition(println)
>>> >>> >>>>>>           ^
>>> >>> >>>>>>
>>> >>> >>>>>> scala> sc.version
>>> >>> >>>>>> res9: String = 2.0.0-SNAPSHOT
>>> >>> >>>>>>
>>> >>> >>>>>> Pozdrawiam,
>>> >>> >>>>>> Jacek Laskowski
>>> >>> >>>>>> ----
>>> >>> >>>>>> https://medium.com/@jaceklaskowski/
>>> >>> >>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>> >>>>>> Follow me at https://twitter.com/jaceklaskowski
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>> ---------------------------------------------------------------------
>>> >>> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>> >>>>>>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>>
>>> >>
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
I asked this question in Scala user group two years ago:
https://groups.google.com/forum/#!topic/scala-user/W4f0d8xK1nk

Take a look if you are interested in.

On Tue, Jul 5, 2016 at 1:31 PM, Reynold Xin <rx...@databricks.com> wrote:

> You can file it here: https://issues.scala-lang.org/secure/Dashboard.jspa
>
> Perhaps "bug" is not the right word, but "limitation". println accepts a
> single argument of type Any and returns Unit, and it appears that Scala
> fails to infer the correct overloaded method in this case.
>
>   def println() = Console.println()
>   def println(x: Any) = Console.println(x)
>
>
>
> On Tue, Jul 5, 2016 at 1:27 PM, Cody Koeninger <co...@koeninger.org> wrote:
>
>> I don't think that's a scala compiler bug.
>>
>> println is a valid expression that returns unit.
>>
>> Unit is not a single-argument function, and does not match any of the
>> overloads of foreachPartition
>>
>> You may be used to a conversion taking place when println is passed to
>> method expecting a function, but that's not a safe thing to do
>> silently for multiple overloads.
>>
>> tldr;
>>
>> just use
>>
>> ds.foreachPartition(x => println(x))
>>
>> you don't need any type annotations
>>
>>
>> On Tue, Jul 5, 2016 at 2:53 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>> > Hi Reynold,
>> >
>> > Is this already reported and tracked somewhere. I'm quite sure that
>> > people will be asking about the reasons Spark does this. Where are
>> > such issues reported usually?
>> >
>> > Pozdrawiam,
>> > Jacek Laskowski
>> > ----
>> > https://medium.com/@jaceklaskowski/
>> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> > Follow me at https://twitter.com/jaceklaskowski
>> >
>> >
>> > On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin <rx...@databricks.com>
>> wrote:
>> >> This seems like a Scala compiler bug.
>> >>
>> >>
>> >> On Tuesday, July 5, 2016, Jacek Laskowski <ja...@japila.pl> wrote:
>> >>>
>> >>> Well, there is foreach for Java and another foreach for Scala. That's
>> >>> what I can understand. But while supporting two language-specific APIs
>> >>> -- Scala and Java -- Dataset API lost support for such simple calls
>> >>> without type annotations so you have to be explicit about the variant
>> >>> (since I'm using Scala I want to use Scala API right). It appears that
>> >>> any single-argument-function operators in Datasets are affected :(
>> >>>
>> >>> My question was to know whether there are works to fix it (if possible
>> >>> -- I don't know if it is).
>> >>>
>> >>> Pozdrawiam,
>> >>> Jacek Laskowski
>> >>> ----
>> >>> https://medium.com/@jaceklaskowski/
>> >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >>> Follow me at https://twitter.com/jaceklaskowski
>> >>>
>> >>>
>> >>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>> > Right, should have noticed that in your second mail. But foreach
>> >>> > already does what you want, right? it would be identical here.
>> >>> >
>> >>> > How these two methods do conceptually different things on different
>> >>> > arguments. I don't think I'd expect them to accept the same
>> functions.
>> >>> >
>> >>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <ja...@japila.pl>
>> wrote:
>> >>> >> ds is Dataset and the problem is that println (or any other
>> >>> >> one-element function) would not work here (and perhaps other
>> methods
>> >>> >> with two variants - Java's and Scala's).
>> >>> >>
>> >>> >> Pozdrawiam,
>> >>> >> Jacek Laskowski
>> >>> >> ----
>> >>> >> https://medium.com/@jaceklaskowski/
>> >>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >>> >> Follow me at https://twitter.com/jaceklaskowski
>> >>> >>
>> >>> >>
>> >>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com>
>> wrote:
>> >>> >>> A DStream is a sequence of RDDs, not of elements. I don't think
>> I'd
>> >>> >>> expect to express an operation on a DStream as if it were
>> elements.
>> >>> >>>
>> >>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl>
>> >>> >>> wrote:
>> >>> >>>> Sort of. Your example works, but could you do a mere
>> >>> >>>> ds.foreachPartition(println)? Why not? What should I even see the
>> >>> >>>> Java
>> >>> >>>> version?
>> >>> >>>>
>> >>> >>>> scala> val ds = spark.range(10)
>> >>> >>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>> >>> >>>>
>> >>> >>>> scala> ds.foreachPartition(println)
>> >>> >>>> <console>:26: error: overloaded method value foreachPartition
>> with
>> >>> >>>> alternatives:
>> >>> >>>>   (func:
>> >>> >>>>
>> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>> >>> >>>> <and>
>> >>> >>>>   (f: Iterator[Long] => Unit)Unit
>> >>> >>>>  cannot be applied to (Unit)
>> >>> >>>>        ds.foreachPartition(println)
>> >>> >>>>           ^
>> >>> >>>>
>> >>> >>>> Pozdrawiam,
>> >>> >>>> Jacek Laskowski
>> >>> >>>> ----
>> >>> >>>> https://medium.com/@jaceklaskowski/
>> >>> >>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >>> >>>> Follow me at https://twitter.com/jaceklaskowski
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com>
>> wrote:
>> >>> >>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or
>> similar?
>> >>> >>>>>
>> >>> >>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <
>> jacek@japila.pl>
>> >>> >>>>> wrote:
>> >>> >>>>>> Hi,
>> >>> >>>>>>
>> >>> >>>>>> It's with the master built today. Why can't I call
>> >>> >>>>>> ds.foreachPartition(println)? Is using type annotation the
>> only way
>> >>> >>>>>> to
>> >>> >>>>>> go forward? I'd be so sad if that's the case.
>> >>> >>>>>>
>> >>> >>>>>> scala> ds.foreachPartition(println)
>> >>> >>>>>> <console>:28: error: overloaded method value foreachPartition
>> with
>> >>> >>>>>> alternatives:
>> >>> >>>>>>   (func:
>> >>> >>>>>>
>> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>> >>> >>>>>> <and>
>> >>> >>>>>>   (f: Iterator[Record] => Unit)Unit
>> >>> >>>>>>  cannot be applied to (Unit)
>> >>> >>>>>>        ds.foreachPartition(println)
>> >>> >>>>>>           ^
>> >>> >>>>>>
>> >>> >>>>>> scala> sc.version
>> >>> >>>>>> res9: String = 2.0.0-SNAPSHOT
>> >>> >>>>>>
>> >>> >>>>>> Pozdrawiam,
>> >>> >>>>>> Jacek Laskowski
>> >>> >>>>>> ----
>> >>> >>>>>> https://medium.com/@jaceklaskowski/
>> >>> >>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >>> >>>>>> Follow me at https://twitter.com/jaceklaskowski
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> ---------------------------------------------------------------------
>> >>> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>> >>>>>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >
>>
>
>

Re: Why's ds.foreachPartition(println) not possible?

Posted by Reynold Xin <rx...@databricks.com>.
You can file it here: https://issues.scala-lang.org/secure/Dashboard.jspa

Perhaps "bug" is not the right word, but "limitation". println accepts a
single argument of type Any and returns Unit, and it appears that Scala
fails to infer the correct overloaded method in this case.

  def println() = Console.println()
  def println(x: Any) = Console.println(x)



On Tue, Jul 5, 2016 at 1:27 PM, Cody Koeninger <co...@koeninger.org> wrote:

> I don't think that's a scala compiler bug.
>
> println is a valid expression that returns unit.
>
> Unit is not a single-argument function, and does not match any of the
> overloads of foreachPartition
>
> You may be used to a conversion taking place when println is passed to
> method expecting a function, but that's not a safe thing to do
> silently for multiple overloads.
>
> tldr;
>
> just use
>
> ds.foreachPartition(x => println(x))
>
> you don't need any type annotations
>
>
> On Tue, Jul 5, 2016 at 2:53 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> > Hi Reynold,
> >
> > Is this already reported and tracked somewhere. I'm quite sure that
> > people will be asking about the reasons Spark does this. Where are
> > such issues reported usually?
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin <rx...@databricks.com> wrote:
> >> This seems like a Scala compiler bug.
> >>
> >>
> >> On Tuesday, July 5, 2016, Jacek Laskowski <ja...@japila.pl> wrote:
> >>>
> >>> Well, there is foreach for Java and another foreach for Scala. That's
> >>> what I can understand. But while supporting two language-specific APIs
> >>> -- Scala and Java -- Dataset API lost support for such simple calls
> >>> without type annotations so you have to be explicit about the variant
> >>> (since I'm using Scala I want to use Scala API right). It appears that
> >>> any single-argument-function operators in Datasets are affected :(
> >>>
> >>> My question was to know whether there are works to fix it (if possible
> >>> -- I don't know if it is).
> >>>
> >>> Pozdrawiam,
> >>> Jacek Laskowski
> >>> ----
> >>> https://medium.com/@jaceklaskowski/
> >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>> Follow me at https://twitter.com/jaceklaskowski
> >>>
> >>>
> >>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <so...@cloudera.com> wrote:
> >>> > Right, should have noticed that in your second mail. But foreach
> >>> > already does what you want, right? it would be identical here.
> >>> >
> >>> > How these two methods do conceptually different things on different
> >>> > arguments. I don't think I'd expect them to accept the same
> functions.
> >>> >
> >>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <ja...@japila.pl>
> wrote:
> >>> >> ds is Dataset and the problem is that println (or any other
> >>> >> one-element function) would not work here (and perhaps other methods
> >>> >> with two variants - Java's and Scala's).
> >>> >>
> >>> >> Pozdrawiam,
> >>> >> Jacek Laskowski
> >>> >> ----
> >>> >> https://medium.com/@jaceklaskowski/
> >>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>> >> Follow me at https://twitter.com/jaceklaskowski
> >>> >>
> >>> >>
> >>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com>
> wrote:
> >>> >>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
> >>> >>> expect to express an operation on a DStream as if it were elements.
> >>> >>>
> >>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl>
> >>> >>> wrote:
> >>> >>>> Sort of. Your example works, but could you do a mere
> >>> >>>> ds.foreachPartition(println)? Why not? What should I even see the
> >>> >>>> Java
> >>> >>>> version?
> >>> >>>>
> >>> >>>> scala> val ds = spark.range(10)
> >>> >>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> >>> >>>>
> >>> >>>> scala> ds.foreachPartition(println)
> >>> >>>> <console>:26: error: overloaded method value foreachPartition with
> >>> >>>> alternatives:
> >>> >>>>   (func:
> >>> >>>>
> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
> >>> >>>> <and>
> >>> >>>>   (f: Iterator[Long] => Unit)Unit
> >>> >>>>  cannot be applied to (Unit)
> >>> >>>>        ds.foreachPartition(println)
> >>> >>>>           ^
> >>> >>>>
> >>> >>>> Pozdrawiam,
> >>> >>>> Jacek Laskowski
> >>> >>>> ----
> >>> >>>> https://medium.com/@jaceklaskowski/
> >>> >>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>> >>>> Follow me at https://twitter.com/jaceklaskowski
> >>> >>>>
> >>> >>>>
> >>> >>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com>
> wrote:
> >>> >>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or
> similar?
> >>> >>>>>
> >>> >>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <jacek@japila.pl
> >
> >>> >>>>> wrote:
> >>> >>>>>> Hi,
> >>> >>>>>>
> >>> >>>>>> It's with the master built today. Why can't I call
> >>> >>>>>> ds.foreachPartition(println)? Is using type annotation the only
> way
> >>> >>>>>> to
> >>> >>>>>> go forward? I'd be so sad if that's the case.
> >>> >>>>>>
> >>> >>>>>> scala> ds.foreachPartition(println)
> >>> >>>>>> <console>:28: error: overloaded method value foreachPartition
> with
> >>> >>>>>> alternatives:
> >>> >>>>>>   (func:
> >>> >>>>>>
> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
> >>> >>>>>> <and>
> >>> >>>>>>   (f: Iterator[Record] => Unit)Unit
> >>> >>>>>>  cannot be applied to (Unit)
> >>> >>>>>>        ds.foreachPartition(println)
> >>> >>>>>>           ^
> >>> >>>>>>
> >>> >>>>>> scala> sc.version
> >>> >>>>>> res9: String = 2.0.0-SNAPSHOT
> >>> >>>>>>
> >>> >>>>>> Pozdrawiam,
> >>> >>>>>> Jacek Laskowski
> >>> >>>>>> ----
> >>> >>>>>> https://medium.com/@jaceklaskowski/
> >>> >>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>> >>>>>> Follow me at https://twitter.com/jaceklaskowski
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> ---------------------------------------------------------------------
> >>> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>> >>>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>

Re: Why's ds.foreachPartition(println) not possible?

Posted by Cody Koeninger <co...@koeninger.org>.
I don't think that's a scala compiler bug.

println is a valid expression that returns unit.

Unit is not a single-argument function, and does not match any of the
overloads of foreachPartition

You may be used to a conversion taking place when println is passed to
method expecting a function, but that's not a safe thing to do
silently for multiple overloads.

tldr;

just use

ds.foreachPartition(x => println(x))

you don't need any type annotations


On Tue, Jul 5, 2016 at 2:53 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi Reynold,
>
> Is this already reported and tracked somewhere. I'm quite sure that
> people will be asking about the reasons Spark does this. Where are
> such issues reported usually?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin <rx...@databricks.com> wrote:
>> This seems like a Scala compiler bug.
>>
>>
>> On Tuesday, July 5, 2016, Jacek Laskowski <ja...@japila.pl> wrote:
>>>
>>> Well, there is foreach for Java and another foreach for Scala. That's
>>> what I can understand. But while supporting two language-specific APIs
>>> -- Scala and Java -- Dataset API lost support for such simple calls
>>> without type annotations so you have to be explicit about the variant
>>> (since I'm using Scala I want to use Scala API right). It appears that
>>> any single-argument-function operators in Datasets are affected :(
>>>
>>> My question was to know whether there are works to fix it (if possible
>>> -- I don't know if it is).
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <so...@cloudera.com> wrote:
>>> > Right, should have noticed that in your second mail. But foreach
>>> > already does what you want, right? it would be identical here.
>>> >
>>> > How these two methods do conceptually different things on different
>>> > arguments. I don't think I'd expect them to accept the same functions.
>>> >
>>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>> >> ds is Dataset and the problem is that println (or any other
>>> >> one-element function) would not work here (and perhaps other methods
>>> >> with two variants - Java's and Scala's).
>>> >>
>>> >> Pozdrawiam,
>>> >> Jacek Laskowski
>>> >> ----
>>> >> https://medium.com/@jaceklaskowski/
>>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >> Follow me at https://twitter.com/jaceklaskowski
>>> >>
>>> >>
>>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>>> >>> expect to express an operation on a DStream as if it were elements.
>>> >>>
>>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl>
>>> >>> wrote:
>>> >>>> Sort of. Your example works, but could you do a mere
>>> >>>> ds.foreachPartition(println)? Why not? What should I even see the
>>> >>>> Java
>>> >>>> version?
>>> >>>>
>>> >>>> scala> val ds = spark.range(10)
>>> >>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>> >>>>
>>> >>>> scala> ds.foreachPartition(println)
>>> >>>> <console>:26: error: overloaded method value foreachPartition with
>>> >>>> alternatives:
>>> >>>>   (func:
>>> >>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>> >>>> <and>
>>> >>>>   (f: Iterator[Long] => Unit)Unit
>>> >>>>  cannot be applied to (Unit)
>>> >>>>        ds.foreachPartition(println)
>>> >>>>           ^
>>> >>>>
>>> >>>> Pozdrawiam,
>>> >>>> Jacek Laskowski
>>> >>>> ----
>>> >>>> https://medium.com/@jaceklaskowski/
>>> >>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>>> Follow me at https://twitter.com/jaceklaskowski
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>> >>>>>
>>> >>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl>
>>> >>>>> wrote:
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> It's with the master built today. Why can't I call
>>> >>>>>> ds.foreachPartition(println)? Is using type annotation the only way
>>> >>>>>> to
>>> >>>>>> go forward? I'd be so sad if that's the case.
>>> >>>>>>
>>> >>>>>> scala> ds.foreachPartition(println)
>>> >>>>>> <console>:28: error: overloaded method value foreachPartition with
>>> >>>>>> alternatives:
>>> >>>>>>   (func:
>>> >>>>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>> >>>>>> <and>
>>> >>>>>>   (f: Iterator[Record] => Unit)Unit
>>> >>>>>>  cannot be applied to (Unit)
>>> >>>>>>        ds.foreachPartition(println)
>>> >>>>>>           ^
>>> >>>>>>
>>> >>>>>> scala> sc.version
>>> >>>>>> res9: String = 2.0.0-SNAPSHOT
>>> >>>>>>
>>> >>>>>> Pozdrawiam,
>>> >>>>>> Jacek Laskowski
>>> >>>>>> ----
>>> >>>>>> https://medium.com/@jaceklaskowski/
>>> >>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >>>>>> Follow me at https://twitter.com/jaceklaskowski
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> ---------------------------------------------------------------------
>>> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Reynold,

Is this already reported and tracked somewhere. I'm quite sure that
people will be asking about the reasons Spark does this. Where are
such issues reported usually?

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin <rx...@databricks.com> wrote:
> This seems like a Scala compiler bug.
>
>
> On Tuesday, July 5, 2016, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Well, there is foreach for Java and another foreach for Scala. That's
>> what I can understand. But while supporting two language-specific APIs
>> -- Scala and Java -- Dataset API lost support for such simple calls
>> without type annotations so you have to be explicit about the variant
>> (since I'm using Scala I want to use Scala API right). It appears that
>> any single-argument-function operators in Datasets are affected :(
>>
>> My question was to know whether there are works to fix it (if possible
>> -- I don't know if it is).
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <so...@cloudera.com> wrote:
>> > Right, should have noticed that in your second mail. But foreach
>> > already does what you want, right? it would be identical here.
>> >
>> > How these two methods do conceptually different things on different
>> > arguments. I don't think I'd expect them to accept the same functions.
>> >
>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>> >> ds is Dataset and the problem is that println (or any other
>> >> one-element function) would not work here (and perhaps other methods
>> >> with two variants - Java's and Scala's).
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> ----
>> >> https://medium.com/@jaceklaskowski/
>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >> Follow me at https://twitter.com/jaceklaskowski
>> >>
>> >>
>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>> >>> expect to express an operation on a DStream as if it were elements.
>> >>>
>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl>
>> >>> wrote:
>> >>>> Sort of. Your example works, but could you do a mere
>> >>>> ds.foreachPartition(println)? Why not? What should I even see the
>> >>>> Java
>> >>>> version?
>> >>>>
>> >>>> scala> val ds = spark.range(10)
>> >>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>> >>>>
>> >>>> scala> ds.foreachPartition(println)
>> >>>> <console>:26: error: overloaded method value foreachPartition with
>> >>>> alternatives:
>> >>>>   (func:
>> >>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>> >>>> <and>
>> >>>>   (f: Iterator[Long] => Unit)Unit
>> >>>>  cannot be applied to (Unit)
>> >>>>        ds.foreachPartition(println)
>> >>>>           ^
>> >>>>
>> >>>> Pozdrawiam,
>> >>>> Jacek Laskowski
>> >>>> ----
>> >>>> https://medium.com/@jaceklaskowski/
>> >>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >>>> Follow me at https://twitter.com/jaceklaskowski
>> >>>>
>> >>>>
>> >>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>> >>>>>
>> >>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl>
>> >>>>> wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> It's with the master built today. Why can't I call
>> >>>>>> ds.foreachPartition(println)? Is using type annotation the only way
>> >>>>>> to
>> >>>>>> go forward? I'd be so sad if that's the case.
>> >>>>>>
>> >>>>>> scala> ds.foreachPartition(println)
>> >>>>>> <console>:28: error: overloaded method value foreachPartition with
>> >>>>>> alternatives:
>> >>>>>>   (func:
>> >>>>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>> >>>>>> <and>
>> >>>>>>   (f: Iterator[Record] => Unit)Unit
>> >>>>>>  cannot be applied to (Unit)
>> >>>>>>        ds.foreachPartition(println)
>> >>>>>>           ^
>> >>>>>>
>> >>>>>> scala> sc.version
>> >>>>>> res9: String = 2.0.0-SNAPSHOT
>> >>>>>>
>> >>>>>> Pozdrawiam,
>> >>>>>> Jacek Laskowski
>> >>>>>> ----
>> >>>>>> https://medium.com/@jaceklaskowski/
>> >>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >>>>>> Follow me at https://twitter.com/jaceklaskowski
>> >>>>>>
>> >>>>>>
>> >>>>>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Reynold Xin <rx...@databricks.com>.
This seems like a Scala compiler bug.

On Tuesday, July 5, 2016, Jacek Laskowski <ja...@japila.pl> wrote:

> Well, there is foreach for Java and another foreach for Scala. That's
> what I can understand. But while supporting two language-specific APIs
> -- Scala and Java -- Dataset API lost support for such simple calls
> without type annotations so you have to be explicit about the variant
> (since I'm using Scala I want to use Scala API right). It appears that
> any single-argument-function operators in Datasets are affected :(
>
> My question was to know whether there are works to fix it (if possible
> -- I don't know if it is).
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <sowen@cloudera.com
> <javascript:;>> wrote:
> > Right, should have noticed that in your second mail. But foreach
> > already does what you want, right? it would be identical here.
> >
> > How these two methods do conceptually different things on different
> > arguments. I don't think I'd expect them to accept the same functions.
> >
> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <jacek@japila.pl
> <javascript:;>> wrote:
> >> ds is Dataset and the problem is that println (or any other
> >> one-element function) would not work here (and perhaps other methods
> >> with two variants - Java's and Scala's).
> >>
> >> Pozdrawiam,
> >> Jacek Laskowski
> >> ----
> >> https://medium.com/@jaceklaskowski/
> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >> Follow me at https://twitter.com/jaceklaskowski
> >>
> >>
> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <sowen@cloudera.com
> <javascript:;>> wrote:
> >>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
> >>> expect to express an operation on a DStream as if it were elements.
> >>>
> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <jacek@japila.pl
> <javascript:;>> wrote:
> >>>> Sort of. Your example works, but could you do a mere
> >>>> ds.foreachPartition(println)? Why not? What should I even see the Java
> >>>> version?
> >>>>
> >>>> scala> val ds = spark.range(10)
> >>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> >>>>
> >>>> scala> ds.foreachPartition(println)
> >>>> <console>:26: error: overloaded method value foreachPartition with
> alternatives:
> >>>>   (func:
> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
> >>>> <and>
> >>>>   (f: Iterator[Long] => Unit)Unit
> >>>>  cannot be applied to (Unit)
> >>>>        ds.foreachPartition(println)
> >>>>           ^
> >>>>
> >>>> Pozdrawiam,
> >>>> Jacek Laskowski
> >>>> ----
> >>>> https://medium.com/@jaceklaskowski/
> >>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>>> Follow me at https://twitter.com/jaceklaskowski
> >>>>
> >>>>
> >>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <sowen@cloudera.com
> <javascript:;>> wrote:
> >>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
> >>>>>
> >>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <jacek@japila.pl
> <javascript:;>> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> It's with the master built today. Why can't I call
> >>>>>> ds.foreachPartition(println)? Is using type annotation the only way
> to
> >>>>>> go forward? I'd be so sad if that's the case.
> >>>>>>
> >>>>>> scala> ds.foreachPartition(println)
> >>>>>> <console>:28: error: overloaded method value foreachPartition with
> alternatives:
> >>>>>>   (func:
> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
> >>>>>> <and>
> >>>>>>   (f: Iterator[Record] => Unit)Unit
> >>>>>>  cannot be applied to (Unit)
> >>>>>>        ds.foreachPartition(println)
> >>>>>>           ^
> >>>>>>
> >>>>>> scala> sc.version
> >>>>>> res9: String = 2.0.0-SNAPSHOT
> >>>>>>
> >>>>>> Pozdrawiam,
> >>>>>> Jacek Laskowski
> >>>>>> ----
> >>>>>> https://medium.com/@jaceklaskowski/
> >>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >>>>>> Follow me at https://twitter.com/jaceklaskowski
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> <javascript:;>
> >>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <javascript:;>
>
>

Re: Why's ds.foreachPartition(println) not possible?

Posted by Jacek Laskowski <ja...@japila.pl>.
Well, there is foreach for Java and another foreach for Scala. That's
what I can understand. But while supporting two language-specific APIs
-- Scala and Java -- Dataset API lost support for such simple calls
without type annotations so you have to be explicit about the variant
(since I'm using Scala I want to use Scala API right). It appears that
any single-argument-function operators in Datasets are affected :(

My question was to know whether there are works to fix it (if possible
-- I don't know if it is).

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen <so...@cloudera.com> wrote:
> Right, should have noticed that in your second mail. But foreach
> already does what you want, right? it would be identical here.
>
> How these two methods do conceptually different things on different
> arguments. I don't think I'd expect them to accept the same functions.
>
> On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>> ds is Dataset and the problem is that println (or any other
>> one-element function) would not work here (and perhaps other methods
>> with two variants - Java's and Scala's).
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com> wrote:
>>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>>> expect to express an operation on a DStream as if it were elements.
>>>
>>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>>> Sort of. Your example works, but could you do a mere
>>>> ds.foreachPartition(println)? Why not? What should I even see the Java
>>>> version?
>>>>
>>>> scala> val ds = spark.range(10)
>>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>>>
>>>> scala> ds.foreachPartition(println)
>>>> <console>:26: error: overloaded method value foreachPartition with alternatives:
>>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>>> <and>
>>>>   (f: Iterator[Long] => Unit)Unit
>>>>  cannot be applied to (Unit)
>>>>        ds.foreachPartition(println)
>>>>           ^
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> ----
>>>> https://medium.com/@jaceklaskowski/
>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>
>>>>
>>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com> wrote:
>>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>>>>
>>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> It's with the master built today. Why can't I call
>>>>>> ds.foreachPartition(println)? Is using type annotation the only way to
>>>>>> go forward? I'd be so sad if that's the case.
>>>>>>
>>>>>> scala> ds.foreachPartition(println)
>>>>>> <console>:28: error: overloaded method value foreachPartition with alternatives:
>>>>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>>>>> <and>
>>>>>>   (f: Iterator[Record] => Unit)Unit
>>>>>>  cannot be applied to (Unit)
>>>>>>        ds.foreachPartition(println)
>>>>>>           ^
>>>>>>
>>>>>> scala> sc.version
>>>>>> res9: String = 2.0.0-SNAPSHOT
>>>>>>
>>>>>> Pozdrawiam,
>>>>>> Jacek Laskowski
>>>>>> ----
>>>>>> https://medium.com/@jaceklaskowski/
>>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Sean Owen <so...@cloudera.com>.
Right, should have noticed that in your second mail. But foreach
already does what you want, right? it would be identical here.

How these two methods do conceptually different things on different
arguments. I don't think I'd expect them to accept the same functions.

On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> ds is Dataset and the problem is that println (or any other
> one-element function) would not work here (and perhaps other methods
> with two variants - Java's and Scala's).
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com> wrote:
>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>> expect to express an operation on a DStream as if it were elements.
>>
>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>> Sort of. Your example works, but could you do a mere
>>> ds.foreachPartition(println)? Why not? What should I even see the Java
>>> version?
>>>
>>> scala> val ds = spark.range(10)
>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>>
>>> scala> ds.foreachPartition(println)
>>> <console>:26: error: overloaded method value foreachPartition with alternatives:
>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>> <and>
>>>   (f: Iterator[Long] => Unit)Unit
>>>  cannot be applied to (Unit)
>>>        ds.foreachPartition(println)
>>>           ^
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com> wrote:
>>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>>>
>>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>>>> Hi,
>>>>>
>>>>> It's with the master built today. Why can't I call
>>>>> ds.foreachPartition(println)? Is using type annotation the only way to
>>>>> go forward? I'd be so sad if that's the case.
>>>>>
>>>>> scala> ds.foreachPartition(println)
>>>>> <console>:28: error: overloaded method value foreachPartition with alternatives:
>>>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>>>> <and>
>>>>>   (f: Iterator[Record] => Unit)Unit
>>>>>  cannot be applied to (Unit)
>>>>>        ds.foreachPartition(println)
>>>>>           ^
>>>>>
>>>>> scala> sc.version
>>>>> res9: String = 2.0.0-SNAPSHOT
>>>>>
>>>>> Pozdrawiam,
>>>>> Jacek Laskowski
>>>>> ----
>>>>> https://medium.com/@jaceklaskowski/
>>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Jacek Laskowski <ja...@japila.pl>.
ds is Dataset and the problem is that println (or any other
one-element function) would not work here (and perhaps other methods
with two variants - Java's and Scala's).

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen <so...@cloudera.com> wrote:
> A DStream is a sequence of RDDs, not of elements. I don't think I'd
> expect to express an operation on a DStream as if it were elements.
>
> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>> Sort of. Your example works, but could you do a mere
>> ds.foreachPartition(println)? Why not? What should I even see the Java
>> version?
>>
>> scala> val ds = spark.range(10)
>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>
>> scala> ds.foreachPartition(println)
>> <console>:26: error: overloaded method value foreachPartition with alternatives:
>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>> <and>
>>   (f: Iterator[Long] => Unit)Unit
>>  cannot be applied to (Unit)
>>        ds.foreachPartition(println)
>>           ^
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com> wrote:
>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>>
>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>>> Hi,
>>>>
>>>> It's with the master built today. Why can't I call
>>>> ds.foreachPartition(println)? Is using type annotation the only way to
>>>> go forward? I'd be so sad if that's the case.
>>>>
>>>> scala> ds.foreachPartition(println)
>>>> <console>:28: error: overloaded method value foreachPartition with alternatives:
>>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>>> <and>
>>>>   (f: Iterator[Record] => Unit)Unit
>>>>  cannot be applied to (Unit)
>>>>        ds.foreachPartition(println)
>>>>           ^
>>>>
>>>> scala> sc.version
>>>> res9: String = 2.0.0-SNAPSHOT
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> ----
>>>> https://medium.com/@jaceklaskowski/
>>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Sean Owen <so...@cloudera.com>.
A DStream is a sequence of RDDs, not of elements. I don't think I'd
expect to express an operation on a DStream as if it were elements.

On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> Sort of. Your example works, but could you do a mere
> ds.foreachPartition(println)? Why not? What should I even see the Java
> version?
>
> scala> val ds = spark.range(10)
> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>
> scala> ds.foreachPartition(println)
> <console>:26: error: overloaded method value foreachPartition with alternatives:
>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
> <and>
>   (f: Iterator[Long] => Unit)Unit
>  cannot be applied to (Unit)
>        ds.foreachPartition(println)
>           ^
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com> wrote:
>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>
>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>> Hi,
>>>
>>> It's with the master built today. Why can't I call
>>> ds.foreachPartition(println)? Is using type annotation the only way to
>>> go forward? I'd be so sad if that's the case.
>>>
>>> scala> ds.foreachPartition(println)
>>> <console>:28: error: overloaded method value foreachPartition with alternatives:
>>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>> <and>
>>>   (f: Iterator[Record] => Unit)Unit
>>>  cannot be applied to (Unit)
>>>        ds.foreachPartition(println)
>>>           ^
>>>
>>> scala> sc.version
>>> res9: String = 2.0.0-SNAPSHOT
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Jacek Laskowski <ja...@japila.pl>.
Sort of. Your example works, but could you do a mere
ds.foreachPartition(println)? Why not? What should I even see the Java
version?

scala> val ds = spark.range(10)
ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> ds.foreachPartition(println)
<console>:26: error: overloaded method value foreachPartition with alternatives:
  (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
<and>
  (f: Iterator[Long] => Unit)Unit
 cannot be applied to (Unit)
       ds.foreachPartition(println)
          ^

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen <so...@cloudera.com> wrote:
> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>
> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>> Hi,
>>
>> It's with the master built today. Why can't I call
>> ds.foreachPartition(println)? Is using type annotation the only way to
>> go forward? I'd be so sad if that's the case.
>>
>> scala> ds.foreachPartition(println)
>> <console>:28: error: overloaded method value foreachPartition with alternatives:
>>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>> <and>
>>   (f: Iterator[Record] => Unit)Unit
>>  cannot be applied to (Unit)
>>        ds.foreachPartition(println)
>>           ^
>>
>> scala> sc.version
>> res9: String = 2.0.0-SNAPSHOT
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Why's ds.foreachPartition(println) not possible?

Posted by Sean Owen <so...@cloudera.com>.
Do you not mean ds.foreachPartition(_.foreach(println)) or similar?

On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi,
>
> It's with the master built today. Why can't I call
> ds.foreachPartition(println)? Is using type annotation the only way to
> go forward? I'd be so sad if that's the case.
>
> scala> ds.foreachPartition(println)
> <console>:28: error: overloaded method value foreachPartition with alternatives:
>   (func: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
> <and>
>   (f: Iterator[Record] => Unit)Unit
>  cannot be applied to (Unit)
>        ds.foreachPartition(println)
>           ^
>
> scala> sc.version
> res9: String = 2.0.0-SNAPSHOT
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org