You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy Potter (JIRA)" <ji...@apache.org> on 2015/05/01 19:26:06 UTC

[jira] [Commented] (SOLR-6878) solr.ManagedSynonymFilterFactory all-to-all synonym switch (aka. expand)

    [ https://issues.apache.org/jira/browse/SOLR-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523482#comment-14523482 ] 

Timothy Potter commented on SOLR-6878:
--------------------------------------

I started going through this patch and I have some questions about how to support the "equivalent" synonyms feature for managed synonyms.

NOTE: I'm using the term "equivalent" synonyms based on the doc here:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Specifically, here are a couple of issues I see with supporting equivalent synonyms lists at the managed API level:

1) The default value for expand is true (in the patch), but what if the user changes it to false after already having added equivalent synonym lists? Or vice-versa. What do we do about existing equivalent mappings? We could store the equivalent lists in a separate data structure and then apply the correct behavior depending on the expand flag when the managed data is "viewed", i.e. either a GET request from the API or when updating the data used to initialize the underlying SynonymMap. This is similar to what we do with ignoreCase, however the ignoreCase was easily handled but I think allowing expand to be changed by the API is much more complicated.

Of course we could punt on this issue altogether and just make the expand flag immutable, i.e. you can set it initially to true or false, but cannot change it with the API. If we make it immutable, then we can apply the mapping on update and not have to maintain any additional data structures to remember the raw state of equiv lists.

2) Let's say we allow users to send in equivalent synonym lists to the API, such as:

{code}
curl -v -X PUT \
  -H 'Content-type:application/json' \
  --data-binary '["funny","entertaining","whimsical","jocular"]' \
  'http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english'
{code}

If expand is true, then you end up with the following mappings (pardon the Java code syntax as I didn't want to clean that up for this example):
{code}
    assertJQ(endpoint + "/funny",
        "/funny==['entertaining','jocular','whimiscal']");
    assertJQ(endpoint + "/entertaining",
        "/entertaining==['funny','jocular','whimiscal']");
    assertJQ(endpoint + "/jocular",
        "/jocular==['entertaining','funny','whimiscal']");
    assertJQ(endpoint + "/whimiscal",
        "/whimiscal==['entertaining','funny','jocular']");
{code}

What should the API do if the user then decides to update the specific mappings for "funny" by sending in a request such as:

{code}
curl -v -X PUT \
  -H 'Content-type:application/json' \
  --data-binary '{"funny":["hilarious"]}' \
  'http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english'
{code}

Does the API treat explicit mappings as having precedence over equivalent lists? Or does it fail with some weird error most users won't understand? Seems to get complicated pretty fast ...

I didn't go too far down the path of implementing this so there are probably more questions that will come up. To reiterate my original design assumption for managed synonyms, the API was not intended for humans to interact with directly, rather there should be some sort of UI layer on top of this API that translates synonym mappings into low-level API calls. For me, it's much more clear to send in explicit mappings for each synonym than it is to send some flat list and then interpret that list differently based on some flag.

The only advantage I can see is if the synonym list is huge, then expanding that out in the request makes the request larger. Other than that are there other use cases that require this expand functionality that cannot be achieved with the current implementation? If so, we need to decide if expand should be immutable and what the API should do if an explicit mapping is received for a term that is already used in an equivalent synonym list. [~Soolek] your thoughts on this?

> solr.ManagedSynonymFilterFactory all-to-all synonym switch (aka. expand)
> ------------------------------------------------------------------------
>
>                 Key: SOLR-6878
>                 URL: https://issues.apache.org/jira/browse/SOLR-6878
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.10.2
>            Reporter: Tomasz Sulkowski
>            Assignee: Timothy Potter
>              Labels: ManagedSynonymFilterFactory, REST, SOLR
>         Attachments: SOLR-6878.patch
>
>
> Hi,
> After switching from SynonymFilterFactory to ManagedSynonymFilterFactory I have found out that there is no way to set an all-to-all synonyms relation. Basically (judgind from google search) there is a need for "expand" functionality switch (known from SynonymFilterFactory) which will treat all synonyms with its keyword as equal.
> For example: if we define a "car":["wagen","ride"] relation it would translate a query that includes one of the synonyms or keyword to "car or wagen or ride" independently of which word was used from those three.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org