You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bertrand Dechoux <de...@gmail.com> on 2014/02/27 13:36:53 UTC

Rename filter() into keep(), remove() or take() ?

Hi,

It might seem like a trivial issue but even though it is somehow a standard
name filter() is not really explicit in which way it does work. Sure, it
makes sense to provide a filter function but what happens when it returns
true? Is the current element removed or kept? It is not really obvious.

Has another name been already discussed? It could be keep() or remove().
But take() could also be reused and instead of providing a number, the
filter function could be requested.

Regards

Bertrand

Re: Rename filter() into keep(), remove() or take() ?

Posted by Bertrand Dechoux <de...@gmail.com>.
Clojure made the same kind of choice too : 'filter()' and 'remove()'. So
the behavior of filter is obvious when you know about the other one...
Well, the function name makes sense if you are thinking using a 'logic
paradigm'.

Anyway, that something I had to write about. I understand that the ROI is
really likely not worth it.

Thanks for the feedback

Bertrand


On Thu, Feb 27, 2014 at 3:38 PM, Nick Pentreath <ni...@gmail.com>wrote:

> Agree that filter is perhaps unintuitive. Though the Scala collections API
> has "filter" and "filterNot" which together provide context that makes it
> more intuitive.
>
> And yes the change could be via added methods that don't break existing
> API.
>
> Still overall I would be -1 on this unless a significant proportion of
> users would find it added value.
>
> Actually adding "filterNot" while not that necessary would make more sense
> in my view
>
>
> --
> Sent from Mailbox <https://www.dropbox.com/mailbox> for iPhone
>
>
> On Thu, Feb 27, 2014 at 3:56 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> I understand the explanation but I had to try. However, the change could
>> be made without breaking anything but that's another story.
>>
>> Regards
>>
>> Bertrand
>>
>> Bertrand Dechoux
>>
>>
>> On Thu, Feb 27, 2014 at 2:05 PM, Nick Pentreath <nick.pentreath@gmail.com
>> > wrote:
>>
>>> filter comes from the Scala collection method "filter". I'd say it's
>>> best to keep in line with the Scala collections API, as Spark has done with
>>> RDDs generally (map, flatMap, take etc), so that is is easier and natural
>>> for developers to apply the same thinking for Scala (parallel) collections
>>> to Spark RDDs.
>>>
>>> Plus, such an API change would be a major breaking one and IMO not a
>>> good idea at this stage.
>>>
>>>  deffilter(p: (A) => Boolean<http://www.scala-lang.org/api/2.10.3/scala/Boolean.html>
>>> ): Seq <http://www.scala-lang.org/api/2.10.3/scala/collection/Seq.html>[
>>> A]
>>>
>>> Selects all elements of this sequence which satisfy a predicate.
>>>  p
>>>
>>> the predicate used to test elements.
>>>  returns
>>>
>>> a new sequence consisting of all elements of this sequence that satisfy
>>> the given predicate p. The order of the elements is preserved.
>>>
>>>
>>> On Thu, Feb 27, 2014 at 2:36 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> It might seem like a trivial issue but even though it is somehow a
>>>> standard name filter() is not really explicit in which way it does work.
>>>> Sure, it makes sense to provide a filter function but what happens when it
>>>> returns true? Is the current element removed or kept? It is not really
>>>> obvious.
>>>>
>>>> Has another name been already discussed? It could be keep() or
>>>> remove(). But take() could also be reused and instead of providing a
>>>> number, the filter function could be requested.
>>>>
>>>>  Regards
>>>>
>>>> Bertrand
>>>>
>>>
>>>
>>
>

Re: Rename filter() into keep(), remove() or take() ?

Posted by Nick Pentreath <ni...@gmail.com>.
Agree that filter is perhaps unintuitive. Though the Scala collections API has "filter" and "filterNot" which together provide context that makes it more intuitive.


And yes the change could be via added methods that don't break existing API.


Still overall I would be -1 on this unless a significant proportion of users would find it added value.




Actually adding "filterNot" while not that necessary would make more sense in my view








—
Sent from Mailbox for iPhone

On Thu, Feb 27, 2014 at 3:56 PM, Bertrand Dechoux <de...@gmail.com>
wrote:

> I understand the explanation but I had to try. However, the change could be
> made without breaking anything but that's another story.
> Regards
> Bertrand
> Bertrand Dechoux
> On Thu, Feb 27, 2014 at 2:05 PM, Nick Pentreath <ni...@gmail.com>wrote:
>> filter comes from the Scala collection method "filter". I'd say it's best
>> to keep in line with the Scala collections API, as Spark has done with RDDs
>> generally (map, flatMap, take etc), so that is is easier and natural for
>> developers to apply the same thinking for Scala (parallel) collections to
>> Spark RDDs.
>>
>> Plus, such an API change would be a major breaking one and IMO not a good
>> idea at this stage.
>>
>> deffilter(p: (A) => Boolean<http://www.scala-lang.org/api/2.10.3/scala/Boolean.html>
>> ): Seq <http://www.scala-lang.org/api/2.10.3/scala/collection/Seq.html>[A]
>>
>> Selects all elements of this sequence which satisfy a predicate.
>> p
>>
>> the predicate used to test elements.
>> returns
>>
>> a new sequence consisting of all elements of this sequence that satisfy
>> the given predicate p. The order of the elements is preserved.
>>
>>
>> On Thu, Feb 27, 2014 at 2:36 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> It might seem like a trivial issue but even though it is somehow a
>>> standard name filter() is not really explicit in which way it does work.
>>> Sure, it makes sense to provide a filter function but what happens when it
>>> returns true? Is the current element removed or kept? It is not really
>>> obvious.
>>>
>>> Has another name been already discussed? It could be keep() or remove().
>>> But take() could also be reused and instead of providing a number, the
>>> filter function could be requested.
>>>
>>>  Regards
>>>
>>> Bertrand
>>>
>>
>>

Re: Rename filter() into keep(), remove() or take() ?

Posted by Bertrand Dechoux <de...@gmail.com>.
I understand the explanation but I had to try. However, the change could be
made without breaking anything but that's another story.

Regards

Bertrand

Bertrand Dechoux


On Thu, Feb 27, 2014 at 2:05 PM, Nick Pentreath <ni...@gmail.com>wrote:

> filter comes from the Scala collection method "filter". I'd say it's best
> to keep in line with the Scala collections API, as Spark has done with RDDs
> generally (map, flatMap, take etc), so that is is easier and natural for
> developers to apply the same thinking for Scala (parallel) collections to
> Spark RDDs.
>
> Plus, such an API change would be a major breaking one and IMO not a good
> idea at this stage.
>
> deffilter(p: (A) => Boolean<http://www.scala-lang.org/api/2.10.3/scala/Boolean.html>
> ): Seq <http://www.scala-lang.org/api/2.10.3/scala/collection/Seq.html>[A]
>
> Selects all elements of this sequence which satisfy a predicate.
> p
>
> the predicate used to test elements.
> returns
>
> a new sequence consisting of all elements of this sequence that satisfy
> the given predicate p. The order of the elements is preserved.
>
>
> On Thu, Feb 27, 2014 at 2:36 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> Hi,
>>
>> It might seem like a trivial issue but even though it is somehow a
>> standard name filter() is not really explicit in which way it does work.
>> Sure, it makes sense to provide a filter function but what happens when it
>> returns true? Is the current element removed or kept? It is not really
>> obvious.
>>
>> Has another name been already discussed? It could be keep() or remove().
>> But take() could also be reused and instead of providing a number, the
>> filter function could be requested.
>>
>>  Regards
>>
>> Bertrand
>>
>
>

Re: Rename filter() into keep(), remove() or take() ?

Posted by Nick Pentreath <ni...@gmail.com>.
filter comes from the Scala collection method "filter". I'd say it's best
to keep in line with the Scala collections API, as Spark has done with RDDs
generally (map, flatMap, take etc), so that is is easier and natural for
developers to apply the same thinking for Scala (parallel) collections to
Spark RDDs.

Plus, such an API change would be a major breaking one and IMO not a good
idea at this stage.

deffilter(p: (A) =>
Boolean<http://www.scala-lang.org/api/2.10.3/scala/Boolean.html>
): Seq <http://www.scala-lang.org/api/2.10.3/scala/collection/Seq.html>[A]

Selects all elements of this sequence which satisfy a predicate.
p

the predicate used to test elements.
returns

a new sequence consisting of all elements of this sequence that satisfy the
given predicate p. The order of the elements is preserved.


On Thu, Feb 27, 2014 at 2:36 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Hi,
>
> It might seem like a trivial issue but even though it is somehow a
> standard name filter() is not really explicit in which way it does work.
> Sure, it makes sense to provide a filter function but what happens when it
> returns true? Is the current element removed or kept? It is not really
> obvious.
>
> Has another name been already discussed? It could be keep() or remove().
> But take() could also be reused and instead of providing a number, the
> filter function could be requested.
>
> Regards
>
> Bertrand
>