You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jean-Georges Perrin <jg...@jgp.net> on 2019/12/28 17:38:13 UTC
Issue with map Java lambda function with 3.0.0 preview and preview 2
Hey guys,
This code:
Dataset<Row> incrementalDf = spark
.createDataset(l, Encoders.INT())
.toDF();
Dataset<Integer> dotsDs = incrementalDf
.map(status -> {
double x = Math.random() * 2 - 1;
double y = Math.random() * 2 - 1;
counter++;
if (counter % 100000 == 0) {
System.out.println("" + counter + " darts thrown so far");
}
return (x * x + y * y <= 1) ? 1 : 0;
}, Encoders.INT());
used to work with Spark 2.x, in the two previous, it says:
The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>
IfI define my mapping function as a class it works fine. Here is the class:
private final class DartMapper
implements MapFunction<Row, Integer> {
private static final long serialVersionUID = 38446L;
@Override
public Integer call(Row r) throws Exception {
double x = Math.random() * 2 - 1;
double y = Math.random() * 2 - 1;
counter++;
if (counter % 1000 == 0) {
System.out.println("" + counter + " operations done so far");
}
return (x * x + y * y <= 1) ? 1 : 0;
}
}
Any hint on what/if I did wrong?
jg
Re: Issue with map Java lambda function with 3.0.0 preview and
preview 2
Posted by Jean-Georges Perrin <jg...@jgp.net>.
I forgot… it does the same thing with the reducer…
int dartsInCircle = dotsDs.reduce((x, y) -> x + y);
jg
> On Dec 28, 2019, at 12:38 PM, Jean-Georges Perrin <jg...@jgp.net> wrote:
>
> Hey guys,
>
> This code:
>
> Dataset<Row> incrementalDf = spark
> .createDataset(l, Encoders.INT())
> .toDF();
> Dataset<Integer> dotsDs = incrementalDf
> .map(status -> {
> double x = Math.random() * 2 - 1;
> double y = Math.random() * 2 - 1;
> counter++;
> if (counter % 100000 == 0) {
> System.out.println("" + counter + " darts thrown so far");
> }
> return (x * x + y * y <= 1) ? 1 : 0;
> }, Encoders.INT());
>
> used to work with Spark 2.x, in the two previous, it says:
>
> The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>
>
> IfI define my mapping function as a class it works fine. Here is the class:
>
> private final class DartMapper
> implements MapFunction<Row, Integer> {
> private static final long serialVersionUID = 38446L;
>
> @Override
> public Integer call(Row r) throws Exception {
> double x = Math.random() * 2 - 1;
> double y = Math.random() * 2 - 1;
> counter++;
> if (counter % 1000 == 0) {
> System.out.println("" + counter + " operations done so far");
> }
> return (x * x + y * y <= 1) ? 1 : 0;
> }
> }
>
> Any hint on what/if I did wrong?
>
> jg
>
>
>
Re: Issue with map Java lambda function with 3.0.0 preview and
preview 2
Posted by Jean-Georges Perrin <jg...@jgp.net>.
Thanks Sean - yup, I was having issues with Scala 2.12 for some stuff, so I kept 2.11...
Casting works. Makes the code a little ugly, but… It’s definitely a Scala 2.12 vs. 2.11, not a Spark 3 specifically.
jg
> On Dec 28, 2019, at 1:15 PM, Sean Owen <sr...@gmail.com> wrote:
>
> Yes, it's necessary to cast the lambda in Java as (MapFunction<X,Y>)
> in many cases. This is because the Scala-specific and Java-specific
> versions of .map() both end up accepting a function object that the
> lambda can match, and an Encoder. What I'd have to go back and look up
> is why that would be different in Spark 3; some of that has always
> been the case with Java 8 in Spark 2. I think it might be related to
> Scala 2.12; were you using Spark 2 with Scala 2.11 before?
>
> On Sat, Dec 28, 2019 at 11:38 AM Jean-Georges Perrin <jg...@jgp.net> wrote:
>>
>> Hey guys,
>>
>> This code:
>>
>> Dataset<Row> incrementalDf = spark
>> .createDataset(l, Encoders.INT())
>> .toDF();
>> Dataset<Integer> dotsDs = incrementalDf
>> .map(status -> {
>> double x = Math.random() * 2 - 1;
>> double y = Math.random() * 2 - 1;
>> counter++;
>> if (counter % 100000 == 0) {
>> System.out.println("" + counter + " darts thrown so far");
>> }
>> return (x * x + y * y <= 1) ? 1 : 0;
>> }, Encoders.INT());
>>
>> used to work with Spark 2.x, in the two previous, it says:
>>
>> The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>
>>
>> IfI define my mapping function as a class it works fine. Here is the class:
>>
>> private final class DartMapper
>> implements MapFunction<Row, Integer> {
>> private static final long serialVersionUID = 38446L;
>>
>> @Override
>> public Integer call(Row r) throws Exception {
>> double x = Math.random() * 2 - 1;
>> double y = Math.random() * 2 - 1;
>> counter++;
>> if (counter % 1000 == 0) {
>> System.out.println("" + counter + " operations done so far");
>> }
>> return (x * x + y * y <= 1) ? 1 : 0;
>> }
>> }
>>
>> Any hint on what/if I did wrong?
>>
>> jg
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
Re: Issue with map Java lambda function with 3.0.0 preview and
preview 2
Posted by Sean Owen <sr...@gmail.com>.
Yes, it's necessary to cast the lambda in Java as (MapFunction<X,Y>)
in many cases. This is because the Scala-specific and Java-specific
versions of .map() both end up accepting a function object that the
lambda can match, and an Encoder. What I'd have to go back and look up
is why that would be different in Spark 3; some of that has always
been the case with Java 8 in Spark 2. I think it might be related to
Scala 2.12; were you using Spark 2 with Scala 2.11 before?
On Sat, Dec 28, 2019 at 11:38 AM Jean-Georges Perrin <jg...@jgp.net> wrote:
>
> Hey guys,
>
> This code:
>
> Dataset<Row> incrementalDf = spark
> .createDataset(l, Encoders.INT())
> .toDF();
> Dataset<Integer> dotsDs = incrementalDf
> .map(status -> {
> double x = Math.random() * 2 - 1;
> double y = Math.random() * 2 - 1;
> counter++;
> if (counter % 100000 == 0) {
> System.out.println("" + counter + " darts thrown so far");
> }
> return (x * x + y * y <= 1) ? 1 : 0;
> }, Encoders.INT());
>
> used to work with Spark 2.x, in the two previous, it says:
>
> The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>
>
> IfI define my mapping function as a class it works fine. Here is the class:
>
> private final class DartMapper
> implements MapFunction<Row, Integer> {
> private static final long serialVersionUID = 38446L;
>
> @Override
> public Integer call(Row r) throws Exception {
> double x = Math.random() * 2 - 1;
> double y = Math.random() * 2 - 1;
> counter++;
> if (counter % 1000 == 0) {
> System.out.println("" + counter + " operations done so far");
> }
> return (x * x + y * y <= 1) ? 1 : 0;
> }
> }
>
> Any hint on what/if I did wrong?
>
> jg
>
>
>
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org