You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Burak Yavuz <br...@gmail.com> on 2015/08/02 06:11:15 UTC

Re: FrequentItems in spark-sql-execution-stat

Hi Yucheng,

Thanks for pointing out the issue. You are correct, in the case that the
final map is completely empty after the merge, we do need to add the final
element to the map, with the correct count (decrement the count with the
max count that was already in the map). I'll submit a fix for it.

Best,
Burak

On Fri, Jul 31, 2015 at 11:44 AM, Koert Kuipers <ko...@tresata.com> wrote:

> this looks like a mistake in FrequentItems to me. if the map is full
> (map.size==size) then it should still add the new item (after removing
> items from the map and decrementing counts).
>
> if its not a mistake then at least it looks to me like the algo is
> different than described in the paper. is this maybe on purpose?
>
> On Thu, Jul 30, 2015 at 4:26 PM, Yucheng <yl...@nyu.edu> wrote:
>
>> Hi all,
>>
>> I'm reading the code in spark-sql-execution-stat-FrequentItems.scala, and
>> I'm a little confused about the "add" method in the FreqItemCounter class.
>> Please see the link here:
>>
>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/FrequentItems.scala
>>
>> My question is when the baseMap does not contain the key, and the size of
>> the baseMap is not less than size, why should we just keep the key/value
>> pairs whose value is greater than count?
>>
>> Just like this example:
>> Now the baseMap is Map(1 -> 3, 2 -> 3, 3 -> 4), and the size is 3. I want
>> to
>> add Map(4 -> 25) into this baseMap, so it will retain the key/values whose
>> value is greater than 25, and in that way, the baseMap will be null.
>> However, I think we should at least add 4 -> 25 into the baseMap. Could
>> anybody help me with this problem?
>>
>> Best,
>> Yucheng
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/FrequentItems-in-spark-sql-execution-stat-tp13527.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>