You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Rob Tompkins <ch...@gmail.com> on 2019/03/09 22:01:47 UTC

Re: [apache/commons-text] TEXT-155: Add a generic IntersectionSimilarity measure (#109)

We should be a tad careful with our naming conventions here. In the combinatorics on words space, an “overlap” is a specific repeated pattern, namely cXcXc where c is a letter from an alphabet and X is string (allowed to be empty).

> On Mar 9, 2019, at 4:19 PM, Alex Herbert <no...@github.com> wrote:
> 
> @aherbert <https://github.com/aherbert> pushed 1 commit.
> 
> 9a7d018 <https://github.com/apache/commons-text/commit/9a7d018c3e85031749166195ebab66c07b7d94c6> TEXT-155: Renamed to OverlapSimilarity.
> —
> You are receiving this because you are subscribed to this thread.
> View it on GitHub <https://github.com/apache/commons-text/pull/109/files/ae21c63dcb7f57893d095c802c596b7db268d673..9a7d018c3e85031749166195ebab66c07b7d94c6> or mute the thread <https://github.com/notifications/unsubscribe-auth/ABO1E9AVw6sBj3vetbCaXoC7F8QHT8Bmks5vVCV5gaJpZM4bjORj>.
> 


Re: [apache/commons-text] TEXT-155: Add a generic IntersectionSimilarity measure (#109)

Posted by Alex Herbert <al...@gmail.com>.
> On 9 Mar 2019, at 22:02, Rob Tompkins <ch...@gmail.com> wrote:
> 
> Also this breaks binary compatibility. Are we going for a 2.X with [text]?

This is a new class in a fork repo so there should be no compatibility problems. It is part of an active PR so notifications keep occurring each time the code is updated following review.

The idea is to move common functionality shared by some of the similarity measures using a set into a class that computes the intersection and union of two sets. It was originally named IntersectionSimilarity.

I’ve since discovered that there is an "overlap coefficient" that is a measure of similarity of two sets. So OverlapSimilarity was a bad choice because it could be confused with OverlapCoefficient, even though it is not computing it.

Perhaps SetSimilarity would be a better name?


> 
>> On Mar 9, 2019, at 5:01 PM, Rob Tompkins <ch...@gmail.com> wrote:
>> 
>> We should be a tad careful with our naming conventions here. In the combinatorics on words space, an “overlap” is a specific repeated pattern, namely cXcXc where c is a letter from an alphabet and X is string (allowed to be empty).
>> 
>>> On Mar 9, 2019, at 4:19 PM, Alex Herbert <notifications@github.com <ma...@github.com>> wrote:
>>> 
>>> @aherbert <https://github.com/aherbert> pushed 1 commit.
>>> 
>>> 9a7d018 <https://github.com/apache/commons-text/commit/9a7d018c3e85031749166195ebab66c07b7d94c6> TEXT-155: Renamed to OverlapSimilarity.
>>> —
>>> You are receiving this because you are subscribed to this thread.
>>> View it on GitHub <https://github.com/apache/commons-text/pull/109/files/ae21c63dcb7f57893d095c802c596b7db268d673..9a7d018c3e85031749166195ebab66c07b7d94c6> or mute the thread <https://github.com/notifications/unsubscribe-auth/ABO1E9AVw6sBj3vetbCaXoC7F8QHT8Bmks5vVCV5gaJpZM4bjORj>.
>>> 
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [apache/commons-text] TEXT-155: Add a generic IntersectionSimilarity measure (#109)

Posted by Rob Tompkins <ch...@gmail.com>.
Also this breaks binary compatibility. Are we going for a 2.X with [text]?

> On Mar 9, 2019, at 5:01 PM, Rob Tompkins <ch...@gmail.com> wrote:
> 
> We should be a tad careful with our naming conventions here. In the combinatorics on words space, an “overlap” is a specific repeated pattern, namely cXcXc where c is a letter from an alphabet and X is string (allowed to be empty).
> 
>> On Mar 9, 2019, at 4:19 PM, Alex Herbert <notifications@github.com <ma...@github.com>> wrote:
>> 
>> @aherbert <https://github.com/aherbert> pushed 1 commit.
>> 
>> 9a7d018 <https://github.com/apache/commons-text/commit/9a7d018c3e85031749166195ebab66c07b7d94c6> TEXT-155: Renamed to OverlapSimilarity.
>> —
>> You are receiving this because you are subscribed to this thread.
>> View it on GitHub <https://github.com/apache/commons-text/pull/109/files/ae21c63dcb7f57893d095c802c596b7db268d673..9a7d018c3e85031749166195ebab66c07b7d94c6> or mute the thread <https://github.com/notifications/unsubscribe-auth/ABO1E9AVw6sBj3vetbCaXoC7F8QHT8Bmks5vVCV5gaJpZM4bjORj>.
>> 
>