You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Maximilian Alber <al...@gmail.com> on 2014/11/27 16:14:05 UTC

Wrong and non consistent behavior of max

Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the
max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence
into costs.

The problem is that the according tuple value at position is wrong. The
maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)

(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ })
union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max

Re: Wrong and non consistent behavior of max

Posted by Till Rohrmann <tr...@apache.org>.

I don't know whether this is important because we are talking about the
Java API, but for people familiar with Scala or F#, for example, the
methods minBy/maxBy make perfectly sense. These methods are part of the
standard collection API of the respective language and select the element
with the minimum/maximum selected key. Of course, Flink's minBy/maxBy
return a DataSet with the element instead of the element itself.

On Fri, Nov 28, 2014 at 6:59 PM, Ufuk Celebi <uc...@apache.org> wrote:

> @Stephan: Yap. :)
>
> I agree with Viktor's suggestions. In order to not break existing programs
> I would keep the deprecation annotation as you have in your branch. We
> could then remove them in 0.9.0 or so. For minBy/maxBy I would rename it as
> you suggested, but keep the old methods and deprecate them as well.
>
> Maybe someone can comment on the Javadocs changes here:
> https://github.com/apache/incubator-flink/pull/244
>
> I hope that it makes the API clearer for now.
>
> On Fri, Nov 28, 2014 at 6:23 PM, Viktor Rosenfeld <
> viktor.rosenfeld@tu-berlin.de> wrote:
>
>> Hi,
>>
>> I would drop min/max/sum. I've marked them deprecated in my pull request
>> because I did not change their current implementation based on the old
>> aggregate.
>>
>> I think that it makes more sense to have convenience functions like
>> countByKey which Sebastian suggested that do encapsulate more complex
>> behavior than an aggregation on a single field.
>>
>> I would also rename minBy/maxBy to selectMin/selectMax or something
>> similar
>> (selectMinBy?, selectMinByField?) to make their behavior more explicit.
>>
>> Best,
>> Viktor
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p499.html
>> Sent from the Apache Flink (Incubator) User Mailing List archive. mailing
>> list archive at Nabble.com.
>>
>
>

Re: Wrong and non consistent behavior of max

Posted by Ufuk Celebi <uc...@apache.org>.

@Stephan: Yap. :)

I agree with Viktor's suggestions. In order to not break existing programs
I would keep the deprecation annotation as you have in your branch. We
could then remove them in 0.9.0 or so. For minBy/maxBy I would rename it as
you suggested, but keep the old methods and deprecate them as well.

Maybe someone can comment on the Javadocs changes here:
https://github.com/apache/incubator-flink/pull/244

I hope that it makes the API clearer for now.

On Fri, Nov 28, 2014 at 6:23 PM, Viktor Rosenfeld <
viktor.rosenfeld@tu-berlin.de> wrote:

> Hi,
>
> I would drop min/max/sum. I've marked them deprecated in my pull request
> because I did not change their current implementation based on the old
> aggregate.
>
> I think that it makes more sense to have convenience functions like
> countByKey which Sebastian suggested that do encapsulate more complex
> behavior than an aggregation on a single field.
>
> I would also rename minBy/maxBy to selectMin/selectMax or something similar
> (selectMinBy?, selectMinByField?) to make their behavior more explicit.
>
> Best,
> Viktor
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p499.html
> Sent from the Apache Flink (Incubator) User Mailing List archive. mailing
> list archive at Nabble.com.
>

Re: Wrong and non consistent behavior of max

Posted by Viktor Rosenfeld <vi...@tu-berlin.de>.

Hi,

I would drop min/max/sum. I've marked them deprecated in my pull request
because I did not change their current implementation based on the old
aggregate.

I think that it makes more sense to have convenience functions like
countByKey which Sebastian suggested that do encapsulate more complex
behavior than an aggregation on a single field.

I would also rename minBy/maxBy to selectMin/selectMax or something similar
(selectMinBy?, selectMinByField?) to make their behavior more explicit.

Best,
Viktor



--
View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p499.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at Nabble.com.

Re: Wrong and non consistent behavior of max

Posted by Stephan Ewen <se...@apache.org>.

Viktor's work does not replace minBy or maxBy, right (they are not agg
functions, but selectors)? I think we would retain those anyways. They also
make little sense in composite aggregations, because you can always only
use one of them...

So, the question is whether to keep min() max() or to have that only
accessible through the Viktor aggregations.


On Fri, Nov 28, 2014 at 4:32 PM, Ufuk Celebi <uc...@apache.org> wrote:

> No, I didn't say that I want to overload min/max. I think Viktor's changes
> are exactly in line with what I said. min/max(X) or minBy/maxBy(X) could be
> (maybe deprecated) shortcuts for aggregate(max/min(X)).
>
> On Fri, Nov 28, 2014 at 2:59 PM, Fabian Hueske <fh...@apache.org> wrote:
>
>> I am not sure about this. With the new aggregations that Viktor is
>> working on, things become pretty obvious IMO.
>>
>> data.aggregate(count(), sum(2), min(0));
>> basically shows the structure of the result.
>>
>> I would not go and overload min() and max() in different contexts.
>>
>> 2014-11-28 14:53 GMT+01:00 Ufuk Celebi <uc...@apache.org>:
>>
>>> This is not the first time that people confused this. I think most
>>> people expect the maxBy and minBy behaviour for max/min.
>>>
>>> Maybe it makes sense to move back to the old aggregations API, where you
>>> call the aggregate method and specify as an argument, which type of
>>> aggregation should be performed. I didn't really like this, but if the
>>> current state is confusing people, we should consider to change it again.
>>>
>>> On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <
>>> alber.maximilian@gmail.com> wrote:
>>>
>>>> Hi Fabian!
>>>>
>>>> Ok, thanks! Now it works.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <fh...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Max,
>>>>>
>>>>> the max(i) function does not select the Tuple with the maximum value.
>>>>> Instead, it builds a new Tuple with the maximum value for the i-th
>>>>> attribute. The values of the Tuple's other fields are not defined (in
>>>>> practice they are set to the value of the last Tuple, however the order of
>>>>> Tuples is not defined).
>>>>>
>>>>> The Java API features minBy and maxBy transformations that should do
>>>>> what you are looking for.
>>>>> You can reimplement them for Scala as a simple GroupReduce (or Reduce)
>>>>> function or use the Java function in you Scala code.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>>
>>>>>
>>>>> 2014-11-27 16:14 GMT+01:00 Maximilian Alber <
>>>>> alber.maximilian@gmail.com>:
>>>>>
>>>>>> Hi Flinksters,
>>>>>>
>>>>>> I don't if I made something wrong, but the code seems fine. Basically
>>>>>> the max function does extract a wrong element.
>>>>>>
>>>>>> The error does just happen with my real data, not if I inject some
>>>>>> sequence into costs.
>>>>>>
>>>>>> The problem is that the according tuple value at position is wrong.
>>>>>> The maximum of the second part is detected correctly.
>>>>>>
>>>>>> The code snippet:
>>>>>>
>>>>>> val maxCost = costs map {x => (x.id, x.value)} max(1)
>>>>>>
>>>>>> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _
>>>>>> }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
>>>>>> config.outFile
>>>>>>
>>>>>> The output:
>>>>>>
>>>>>> File content:
>>>>>> first: (47,42.066986)
>>>>>> first: (11,4.448255)
>>>>>> first: (40,42.06696)
>>>>>> first: (3,0.96731037)
>>>>>> first: (31,42.06443)
>>>>>> first: (18,23.753584)
>>>>>> first: (45,42.066986)
>>>>>> first: (24,41.44347)
>>>>>> first: (13,6.1290965)
>>>>>> first: (19,26.42948)
>>>>>> first: (1,0.9665109)
>>>>>> first: (28,42.04222)
>>>>>> first: (5,1.2986814)
>>>>>> first: (44,42.066986)
>>>>>> first: (7,1.8681992)
>>>>>> first: (10,3.0981758)
>>>>>> first: (41,42.066982)
>>>>>> first: (48,42.066986)
>>>>>> first: (21,33.698544)
>>>>>> first: (38,42.066963)
>>>>>> first: (30,42.06153)
>>>>>> first: (26,41.950237)
>>>>>> first: (43,42.066986)
>>>>>> first: (16,14.754578)
>>>>>> first: (15,10.571205)
>>>>>> first: (34,42.06672)
>>>>>> first: (29,42.055424)
>>>>>> first: (35,42.066845)
>>>>>> first: (8,1.9513339)
>>>>>> first: (22,38.189228)
>>>>>> first: (46,42.066986)
>>>>>> first: (2,0.966511)
>>>>>> first: (27,42.013676)
>>>>>> first: (12,5.4271784)
>>>>>> first: (42,42.066986)
>>>>>> first: (4,1.01561)
>>>>>> first: (14,7.4410205)
>>>>>> first: (25,41.803535)
>>>>>> first: (6,1.6827519)
>>>>>> first: (36,42.06694)
>>>>>> first: (20,28.834095)
>>>>>> first: (32,42.06577)
>>>>>> first: (49,42.066986)
>>>>>> first: (33,42.0664)
>>>>>> first: (9,2.2420964)
>>>>>> first: (37,42.066967)
>>>>>> first: (0,0.9665109)
>>>>>> first: (17,19.016153)
>>>>>> first: (39,42.06697)
>>>>>> first: (23,40.512672)
>>>>>> second: (23,42.066986)
>>>>>>
>>>>>> File content end.
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> Cheers,
>>>>>> Max
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Wrong and non consistent behavior of max

Posted by Ufuk Celebi <uc...@apache.org>.

No, I didn't say that I want to overload min/max. I think Viktor's changes
are exactly in line with what I said. min/max(X) or minBy/maxBy(X) could be
(maybe deprecated) shortcuts for aggregate(max/min(X)).

On Fri, Nov 28, 2014 at 2:59 PM, Fabian Hueske <fh...@apache.org> wrote:

> I am not sure about this. With the new aggregations that Viktor is working
> on, things become pretty obvious IMO.
>
> data.aggregate(count(), sum(2), min(0));
> basically shows the structure of the result.
>
> I would not go and overload min() and max() in different contexts.
>
> 2014-11-28 14:53 GMT+01:00 Ufuk Celebi <uc...@apache.org>:
>
>> This is not the first time that people confused this. I think most people
>> expect the maxBy and minBy behaviour for max/min.
>>
>> Maybe it makes sense to move back to the old aggregations API, where you
>> call the aggregate method and specify as an argument, which type of
>> aggregation should be performed. I didn't really like this, but if the
>> current state is confusing people, we should consider to change it again.
>>
>> On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <
>> alber.maximilian@gmail.com> wrote:
>>
>>> Hi Fabian!
>>>
>>> Ok, thanks! Now it works.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <fh...@apache.org>
>>> wrote:
>>>
>>>> Hi Max,
>>>>
>>>> the max(i) function does not select the Tuple with the maximum value.
>>>> Instead, it builds a new Tuple with the maximum value for the i-th
>>>> attribute. The values of the Tuple's other fields are not defined (in
>>>> practice they are set to the value of the last Tuple, however the order of
>>>> Tuples is not defined).
>>>>
>>>> The Java API features minBy and maxBy transformations that should do
>>>> what you are looking for.
>>>> You can reimplement them for Scala as a simple GroupReduce (or Reduce)
>>>> function or use the Java function in you Scala code.
>>>>
>>>> Best, Fabian
>>>>
>>>>
>>>>
>>>> 2014-11-27 16:14 GMT+01:00 Maximilian Alber <alber.maximilian@gmail.com
>>>> >:
>>>>
>>>>> Hi Flinksters,
>>>>>
>>>>> I don't if I made something wrong, but the code seems fine. Basically
>>>>> the max function does extract a wrong element.
>>>>>
>>>>> The error does just happen with my real data, not if I inject some
>>>>> sequence into costs.
>>>>>
>>>>> The problem is that the according tuple value at position is wrong.
>>>>> The maximum of the second part is detected correctly.
>>>>>
>>>>> The code snippet:
>>>>>
>>>>> val maxCost = costs map {x => (x.id, x.value)} max(1)
>>>>>
>>>>> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _
>>>>> }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
>>>>> config.outFile
>>>>>
>>>>> The output:
>>>>>
>>>>> File content:
>>>>> first: (47,42.066986)
>>>>> first: (11,4.448255)
>>>>> first: (40,42.06696)
>>>>> first: (3,0.96731037)
>>>>> first: (31,42.06443)
>>>>> first: (18,23.753584)
>>>>> first: (45,42.066986)
>>>>> first: (24,41.44347)
>>>>> first: (13,6.1290965)
>>>>> first: (19,26.42948)
>>>>> first: (1,0.9665109)
>>>>> first: (28,42.04222)
>>>>> first: (5,1.2986814)
>>>>> first: (44,42.066986)
>>>>> first: (7,1.8681992)
>>>>> first: (10,3.0981758)
>>>>> first: (41,42.066982)
>>>>> first: (48,42.066986)
>>>>> first: (21,33.698544)
>>>>> first: (38,42.066963)
>>>>> first: (30,42.06153)
>>>>> first: (26,41.950237)
>>>>> first: (43,42.066986)
>>>>> first: (16,14.754578)
>>>>> first: (15,10.571205)
>>>>> first: (34,42.06672)
>>>>> first: (29,42.055424)
>>>>> first: (35,42.066845)
>>>>> first: (8,1.9513339)
>>>>> first: (22,38.189228)
>>>>> first: (46,42.066986)
>>>>> first: (2,0.966511)
>>>>> first: (27,42.013676)
>>>>> first: (12,5.4271784)
>>>>> first: (42,42.066986)
>>>>> first: (4,1.01561)
>>>>> first: (14,7.4410205)
>>>>> first: (25,41.803535)
>>>>> first: (6,1.6827519)
>>>>> first: (36,42.06694)
>>>>> first: (20,28.834095)
>>>>> first: (32,42.06577)
>>>>> first: (49,42.066986)
>>>>> first: (33,42.0664)
>>>>> first: (9,2.2420964)
>>>>> first: (37,42.066967)
>>>>> first: (0,0.9665109)
>>>>> first: (17,19.016153)
>>>>> first: (39,42.06697)
>>>>> first: (23,40.512672)
>>>>> second: (23,42.066986)
>>>>>
>>>>> File content end.
>>>>>
>>>>>
>>>>> Thanks!
>>>>> Cheers,
>>>>> Max
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Wrong and non consistent behavior of max

Posted by Fabian Hueske <fh...@apache.org>.

I am not sure about this. With the new aggregations that Viktor is working
on, things become pretty obvious IMO.

data.aggregate(count(), sum(2), min(0));
basically shows the structure of the result.

I would not go and overload min() and max() in different contexts.

2014-11-28 14:53 GMT+01:00 Ufuk Celebi <uc...@apache.org>:

> This is not the first time that people confused this. I think most people
> expect the maxBy and minBy behaviour for max/min.
>
> Maybe it makes sense to move back to the old aggregations API, where you
> call the aggregate method and specify as an argument, which type of
> aggregation should be performed. I didn't really like this, but if the
> current state is confusing people, we should consider to change it again.
>
> On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <
> alber.maximilian@gmail.com> wrote:
>
>> Hi Fabian!
>>
>> Ok, thanks! Now it works.
>>
>> Cheers,
>> Max
>>
>> On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <fh...@apache.org>
>> wrote:
>>
>>> Hi Max,
>>>
>>> the max(i) function does not select the Tuple with the maximum value.
>>> Instead, it builds a new Tuple with the maximum value for the i-th
>>> attribute. The values of the Tuple's other fields are not defined (in
>>> practice they are set to the value of the last Tuple, however the order of
>>> Tuples is not defined).
>>>
>>> The Java API features minBy and maxBy transformations that should do
>>> what you are looking for.
>>> You can reimplement them for Scala as a simple GroupReduce (or Reduce)
>>> function or use the Java function in you Scala code.
>>>
>>> Best, Fabian
>>>
>>>
>>>
>>> 2014-11-27 16:14 GMT+01:00 Maximilian Alber <al...@gmail.com>
>>> :
>>>
>>>> Hi Flinksters,
>>>>
>>>> I don't if I made something wrong, but the code seems fine. Basically
>>>> the max function does extract a wrong element.
>>>>
>>>> The error does just happen with my real data, not if I inject some
>>>> sequence into costs.
>>>>
>>>> The problem is that the according tuple value at position is wrong. The
>>>> maximum of the second part is detected correctly.
>>>>
>>>> The code snippet:
>>>>
>>>> val maxCost = costs map {x => (x.id, x.value)} max(1)
>>>>
>>>> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _
>>>> }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
>>>> config.outFile
>>>>
>>>> The output:
>>>>
>>>> File content:
>>>> first: (47,42.066986)
>>>> first: (11,4.448255)
>>>> first: (40,42.06696)
>>>> first: (3,0.96731037)
>>>> first: (31,42.06443)
>>>> first: (18,23.753584)
>>>> first: (45,42.066986)
>>>> first: (24,41.44347)
>>>> first: (13,6.1290965)
>>>> first: (19,26.42948)
>>>> first: (1,0.9665109)
>>>> first: (28,42.04222)
>>>> first: (5,1.2986814)
>>>> first: (44,42.066986)
>>>> first: (7,1.8681992)
>>>> first: (10,3.0981758)
>>>> first: (41,42.066982)
>>>> first: (48,42.066986)
>>>> first: (21,33.698544)
>>>> first: (38,42.066963)
>>>> first: (30,42.06153)
>>>> first: (26,41.950237)
>>>> first: (43,42.066986)
>>>> first: (16,14.754578)
>>>> first: (15,10.571205)
>>>> first: (34,42.06672)
>>>> first: (29,42.055424)
>>>> first: (35,42.066845)
>>>> first: (8,1.9513339)
>>>> first: (22,38.189228)
>>>> first: (46,42.066986)
>>>> first: (2,0.966511)
>>>> first: (27,42.013676)
>>>> first: (12,5.4271784)
>>>> first: (42,42.066986)
>>>> first: (4,1.01561)
>>>> first: (14,7.4410205)
>>>> first: (25,41.803535)
>>>> first: (6,1.6827519)
>>>> first: (36,42.06694)
>>>> first: (20,28.834095)
>>>> first: (32,42.06577)
>>>> first: (49,42.066986)
>>>> first: (33,42.0664)
>>>> first: (9,2.2420964)
>>>> first: (37,42.066967)
>>>> first: (0,0.9665109)
>>>> first: (17,19.016153)
>>>> first: (39,42.06697)
>>>> first: (23,40.512672)
>>>> second: (23,42.066986)
>>>>
>>>> File content end.
>>>>
>>>>
>>>> Thanks!
>>>> Cheers,
>>>> Max
>>>>
>>>>
>>>
>>
>

Re: Wrong and non consistent behavior of max

Posted by Ufuk Celebi <uc...@apache.org>.

This is not the first time that people confused this. I think most people
expect the maxBy and minBy behaviour for max/min.

Maybe it makes sense to move back to the old aggregations API, where you
call the aggregate method and specify as an argument, which type of
aggregation should be performed. I didn't really like this, but if the
current state is confusing people, we should consider to change it again.

On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <
alber.maximilian@gmail.com> wrote:

> Hi Fabian!
>
> Ok, thanks! Now it works.
>
> Cheers,
> Max
>
> On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <fh...@apache.org> wrote:
>
>> Hi Max,
>>
>> the max(i) function does not select the Tuple with the maximum value.
>> Instead, it builds a new Tuple with the maximum value for the i-th
>> attribute. The values of the Tuple's other fields are not defined (in
>> practice they are set to the value of the last Tuple, however the order of
>> Tuples is not defined).
>>
>> The Java API features minBy and maxBy transformations that should do what
>> you are looking for.
>> You can reimplement them for Scala as a simple GroupReduce (or Reduce)
>> function or use the Java function in you Scala code.
>>
>> Best, Fabian
>>
>>
>>
>> 2014-11-27 16:14 GMT+01:00 Maximilian Alber <al...@gmail.com>:
>>
>>> Hi Flinksters,
>>>
>>> I don't if I made something wrong, but the code seems fine. Basically
>>> the max function does extract a wrong element.
>>>
>>> The error does just happen with my real data, not if I inject some
>>> sequence into costs.
>>>
>>> The problem is that the according tuple value at position is wrong. The
>>> maximum of the second part is detected correctly.
>>>
>>> The code snippet:
>>>
>>> val maxCost = costs map {x => (x.id, x.value)} max(1)
>>>
>>> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ })
>>> union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
>>> config.outFile
>>>
>>> The output:
>>>
>>> File content:
>>> first: (47,42.066986)
>>> first: (11,4.448255)
>>> first: (40,42.06696)
>>> first: (3,0.96731037)
>>> first: (31,42.06443)
>>> first: (18,23.753584)
>>> first: (45,42.066986)
>>> first: (24,41.44347)
>>> first: (13,6.1290965)
>>> first: (19,26.42948)
>>> first: (1,0.9665109)
>>> first: (28,42.04222)
>>> first: (5,1.2986814)
>>> first: (44,42.066986)
>>> first: (7,1.8681992)
>>> first: (10,3.0981758)
>>> first: (41,42.066982)
>>> first: (48,42.066986)
>>> first: (21,33.698544)
>>> first: (38,42.066963)
>>> first: (30,42.06153)
>>> first: (26,41.950237)
>>> first: (43,42.066986)
>>> first: (16,14.754578)
>>> first: (15,10.571205)
>>> first: (34,42.06672)
>>> first: (29,42.055424)
>>> first: (35,42.066845)
>>> first: (8,1.9513339)
>>> first: (22,38.189228)
>>> first: (46,42.066986)
>>> first: (2,0.966511)
>>> first: (27,42.013676)
>>> first: (12,5.4271784)
>>> first: (42,42.066986)
>>> first: (4,1.01561)
>>> first: (14,7.4410205)
>>> first: (25,41.803535)
>>> first: (6,1.6827519)
>>> first: (36,42.06694)
>>> first: (20,28.834095)
>>> first: (32,42.06577)
>>> first: (49,42.066986)
>>> first: (33,42.0664)
>>> first: (9,2.2420964)
>>> first: (37,42.066967)
>>> first: (0,0.9665109)
>>> first: (17,19.016153)
>>> first: (39,42.06697)
>>> first: (23,40.512672)
>>> second: (23,42.066986)
>>>
>>> File content end.
>>>
>>>
>>> Thanks!
>>> Cheers,
>>> Max
>>>
>>>
>>
>

Re: Wrong and non consistent behavior of max

Posted by Maximilian Alber <al...@gmail.com>.

Hi Fabian!

Ok, thanks! Now it works.

Cheers,
Max

On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <fh...@apache.org> wrote:

> Hi Max,
>
> the max(i) function does not select the Tuple with the maximum value.
> Instead, it builds a new Tuple with the maximum value for the i-th
> attribute. The values of the Tuple's other fields are not defined (in
> practice they are set to the value of the last Tuple, however the order of
> Tuples is not defined).
>
> The Java API features minBy and maxBy transformations that should do what
> you are looking for.
> You can reimplement them for Scala as a simple GroupReduce (or Reduce)
> function or use the Java function in you Scala code.
>
> Best, Fabian
>
>
>
> 2014-11-27 16:14 GMT+01:00 Maximilian Alber <al...@gmail.com>:
>
>> Hi Flinksters,
>>
>> I don't if I made something wrong, but the code seems fine. Basically the
>> max function does extract a wrong element.
>>
>> The error does just happen with my real data, not if I inject some
>> sequence into costs.
>>
>> The problem is that the according tuple value at position is wrong. The
>> maximum of the second part is detected correctly.
>>
>> The code snippet:
>>
>> val maxCost = costs map {x => (x.id, x.value)} max(1)
>>
>> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ })
>> union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
>> config.outFile
>>
>> The output:
>>
>> File content:
>> first: (47,42.066986)
>> first: (11,4.448255)
>> first: (40,42.06696)
>> first: (3,0.96731037)
>> first: (31,42.06443)
>> first: (18,23.753584)
>> first: (45,42.066986)
>> first: (24,41.44347)
>> first: (13,6.1290965)
>> first: (19,26.42948)
>> first: (1,0.9665109)
>> first: (28,42.04222)
>> first: (5,1.2986814)
>> first: (44,42.066986)
>> first: (7,1.8681992)
>> first: (10,3.0981758)
>> first: (41,42.066982)
>> first: (48,42.066986)
>> first: (21,33.698544)
>> first: (38,42.066963)
>> first: (30,42.06153)
>> first: (26,41.950237)
>> first: (43,42.066986)
>> first: (16,14.754578)
>> first: (15,10.571205)
>> first: (34,42.06672)
>> first: (29,42.055424)
>> first: (35,42.066845)
>> first: (8,1.9513339)
>> first: (22,38.189228)
>> first: (46,42.066986)
>> first: (2,0.966511)
>> first: (27,42.013676)
>> first: (12,5.4271784)
>> first: (42,42.066986)
>> first: (4,1.01561)
>> first: (14,7.4410205)
>> first: (25,41.803535)
>> first: (6,1.6827519)
>> first: (36,42.06694)
>> first: (20,28.834095)
>> first: (32,42.06577)
>> first: (49,42.066986)
>> first: (33,42.0664)
>> first: (9,2.2420964)
>> first: (37,42.066967)
>> first: (0,0.9665109)
>> first: (17,19.016153)
>> first: (39,42.06697)
>> first: (23,40.512672)
>> second: (23,42.066986)
>>
>> File content end.
>>
>>
>> Thanks!
>> Cheers,
>> Max
>>
>>
>

Re: Wrong and non consistent behavior of max

Posted by Fabian Hueske <fh...@apache.org>.

Hi Max,

the max(i) function does not select the Tuple with the maximum value.
Instead, it builds a new Tuple with the maximum value for the i-th
attribute. The values of the Tuple's other fields are not defined (in
practice they are set to the value of the last Tuple, however the order of
Tuples is not defined).

The Java API features minBy and maxBy transformations that should do what
you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduce)
function or use the Java function in you Scala code.

Best, Fabian


2014-11-27 16:14 GMT+01:00 Maximilian Alber <al...@gmail.com>:

> Hi Flinksters,
>
> I don't if I made something wrong, but the code seems fine. Basically the
> max function does extract a wrong element.
>
> The error does just happen with my real data, not if I inject some
> sequence into costs.
>
> The problem is that the according tuple value at position is wrong. The
> maximum of the second part is detected correctly.
>
> The code snippet:
>
> val maxCost = costs map {x => (x.id, x.value)} max(1)
>
> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ })
> union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
> config.outFile
>
> The output:
>
> File content:
> first: (47,42.066986)
> first: (11,4.448255)
> first: (40,42.06696)
> first: (3,0.96731037)
> first: (31,42.06443)
> first: (18,23.753584)
> first: (45,42.066986)
> first: (24,41.44347)
> first: (13,6.1290965)
> first: (19,26.42948)
> first: (1,0.9665109)
> first: (28,42.04222)
> first: (5,1.2986814)
> first: (44,42.066986)
> first: (7,1.8681992)
> first: (10,3.0981758)
> first: (41,42.066982)
> first: (48,42.066986)
> first: (21,33.698544)
> first: (38,42.066963)
> first: (30,42.06153)
> first: (26,41.950237)
> first: (43,42.066986)
> first: (16,14.754578)
> first: (15,10.571205)
> first: (34,42.06672)
> first: (29,42.055424)
> first: (35,42.066845)
> first: (8,1.9513339)
> first: (22,38.189228)
> first: (46,42.066986)
> first: (2,0.966511)
> first: (27,42.013676)
> first: (12,5.4271784)
> first: (42,42.066986)
> first: (4,1.01561)
> first: (14,7.4410205)
> first: (25,41.803535)
> first: (6,1.6827519)
> first: (36,42.06694)
> first: (20,28.834095)
> first: (32,42.06577)
> first: (49,42.066986)
> first: (33,42.0664)
> first: (9,2.2420964)
> first: (37,42.066967)
> first: (0,0.9665109)
> first: (17,19.016153)
> first: (39,42.06697)
> first: (23,40.512672)
> second: (23,42.066986)
>
> File content end.
>
>
> Thanks!
> Cheers,
> Max
>
>