You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jay Potharaju <js...@gmail.com> on 2018/03/13 20:37:33 UTC

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

I am upgrading to solr 6.6.3 and one of my fields uses text_en_splitting.
Are there any recommendations on how to adjust the fieldtype definition for
these fields.
Thanks

Thanks
Jay Potharaju


On Wed, Feb 7, 2018 at 5:09 AM, Steve Rowe <sa...@gmail.com> wrote:

> Thanks Webster,
>
> I created https://issues.apache.org/jira/browse/SOLR-11955 to work on
> this.
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 6, 2018, at 2:47 PM, Webster Homer <we...@sial.com>
> wrote:
> >
> > I noticed that in some of the current example schemas that are shipped
> with
> > Solr, there is a fieldtype, text_en_splitting, that feeds the output
> > of SynonymGraphFilterFactory into WordDelimiterGraphFilterFactory. So if
> > this isn't supported, the example should probably be updated or removed.
> >
> > On Mon, Feb 5, 2018 at 10:27 AM, Steve Rowe <sa...@gmail.com> wrote:
> >
> >> Hi Александр,
> >>
> >>> On Feb 5, 2018, at 11:19 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> >>>
> >>> There should be no problem with using them together.
> >>
> >> I believe Shawn is wrong.
> >>
> >> From <http://lucene.apache.org/core/7_2_0/analyzers-common/
> >> org/apache/lucene/analysis/synonym/SynonymGraphFilter.html>:
> >>
> >>> NOTE: this cannot consume an incoming graph; results will be undefined.
> >>
> >> Unfortunately, the ref guide entry for Synonym Graph Filter <
> >> https://lucene.apache.org/solr/guide/7_2/filter-
> descriptions.html#synonym-
> >> graph-filter> doesn’t include a warning about this, but it should, like
> >> the warning on Word Delimiter Graph Filter <https://lucene.apache.org/
> >> solr/guide/7_2/filter-descriptions.html#word-delimiter-graph-filter>:
> >>
> >>> Note: although this filter produces correct token graphs, it cannot
> >> consume an input token graph correctly.
> >>
> >> (I’ve just committed a change to the ref guide source to add this also
> on
> >> the Synonym Graph Filter and Managed Synonym Graph Filter entries, to be
> >> included in the ref guide for Solr 7.3.)
> >>
> >> In short, the combination of the two filters is not supported, because
> >> WDGF produces a token graph, which SGF cannot correctly interpret.
> >>
> >> Other filters also have this issue, see e.g. <
> https://issues.apache.org/
> >> jira/browse/LUCENE-3475> for ShingleFilter; this issue has gotten some
> >> attention recently, and hopefully it will inspire fixes elsewhere.
> >>
> >> Patches welcome!
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >>
> >>> On Feb 5, 2018, at 11:19 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> >>>
> >>> On 2/5/2018 3:55 AM, Александр Шестак wrote:
> >>>>
> >>>> Hi, I have misunderstanding about usage of SynonymGraphFilterFactory
> >>>> and  WordDelimiterGraphFilterFactory. Can they be used together?
> >>>>
> >>>
> >>> There should be no problem with using them together.  But it is always
> >>> possible that the behavior will surprise you, while working 100% as
> >>> designed.
> >>>
> >>>> I have solr type configured in next way
> >>>>
> >>>> <fieldtype name="fulltext_en" class="solr.TextField"
> >>>> autoGeneratePhraseQueries="true">
> >>>>  <analyzer type="index">
> >>>>    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>>>    <filter class="solr.WordDelimiterGraphFilterFactory"
> >>>>            generateWordParts="1" generateNumberParts="1"
> >>>> splitOnNumerics="1"
> >>>>            catenateWords="1" catenateNumbers="1" catenateAll="0"
> >>>> preserveOriginal="1" protected="protwords_en.txt"/>
> >>>>    <filter class="solr.FlattenGraphFilterFactory"/>
> >>>>  </analyzer>
> >>>>  <analyzer type="query">
> >>>>    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>>>    <filter class="solr.WordDelimiterGraphFilterFactory"
> >>>>            generateWordParts="1" generateNumberParts="1"
> >>>> splitOnNumerics="1"
> >>>>            catenateWords="0" catenateNumbers="0" catenateAll="0"
> >>>> preserveOriginal="1" protected="protwords_en.txt"/>
> >>>>    <filter class="solr.LowerCaseFilterFactory"/>
> >>>>    <filter class="solr.SynonymGraphFilterFactory"
> >>>>            synonyms="synonyms_en.txt" ignoreCase="true"
> expand="true"/>
> >>>>  </analyzer>
> >>>> </fieldtype>
> >>>>
> >>>> So on query time it uses SynonymGraphFilterFactory after
> >>>> WordDelimiterGraphFilterFactory.
> >>>> Synonyms are configured in next way:
> >>>> b=>b,boron
> >>>> 2=>ii,2
> >>>>
> >>>> Query in solr analysis tool looks so. It is shown that terms after SGF
> >>>> have positions 3 and 4. Is it correct? I thought that they should had
> >>>> 1 and 2 positions.
> >>>>
> >>>
> >>> What matters is the *relative* positions.  The exact position number
> >>> doesn't matter much.  Something new that the Graph implementations use
> >>> is the position length.  That feature is necessary for multi-term
> >>> synonyms to function correctly in phrase queries.
> >>>
> >>> In your analysis screenshot, WDGF creates three tokens.  The two tokens
> >>> created by splitting the input are at positions 1 and 2, which I think
> >>> is 100% as expected.  It also sets the positionLength of the first term
> >>> to 2, probably because it has split that term into 2 additional terms.
> >>>
> >>> Then the SGF takes those last two terms and expands them.  Each of the
> >>> synonyms is at the same position as the original term, and the relative
> >>> positions of the two synonym pairs have not changed -- the second one
> is
> >>> still one higher than the first.  I think the reason that SGF moves the
> >>> positions two higher is because the positionLength on the "b2" term is
> >>> 2, previously set by WDGF.  Someone with more knowledge about the Graph
> >>> implementations may have to speak up as to whether this behavior is
> >> correct.
> >>>
> >>> Because the relative positions of the split terms don't change when SGF
> >>> runs, I think this is probably working as designed.
> >>>
> >>> Thanks,
> >>> Shawn
> >>
> >>
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>
>

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

Posted by Alessandro Benedetti <a....@sease.io>.
Same here, I wanted to add a colleague of mine to the discussion but
possibly I have done it wrong,
apologies!
We'll add more info soon if relevant to the community!
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Wed, 12 Apr 2023 at 16:04, Alessandro Benedetti <a....@sease.io>
wrote:

> FYI
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benedetti@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Tue, 21 Jul 2020 at 13:22, Hronom <hr...@gmail.com> wrote:
>
>> Is there any jira task for solr to work on the issue related to usage of
>> multiple GraphFilter factories in one analysis chain?
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

Posted by Alessandro Benedetti <a....@sease.io>.
FYI
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Tue, 21 Jul 2020 at 13:22, Hronom <hr...@gmail.com> wrote:

> Is there any jira task for solr to work on the issue related to usage of
> multiple GraphFilter factories in one analysis chain?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

Posted by Hronom <hr...@gmail.com>.
Is there any jira task for solr to work on the issue related to usage of
multiple GraphFilter factories in one analysis chain?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

Posted by Jay Potharaju <js...@gmail.com>.
Thanks for the response Rick!. I checked 6.6.2 and it has the same issue.
The only work around that I have now is comment out the
SynonymGraphFilterFactory as we are not using synonyms as of now. But would
like to know how to address this issue once we start using it down the line.
Thanks
J

Thanks
Jay Potharaju


On Wed, Mar 14, 2018 at 1:02 PM, Rick Leir <rl...@leirtech.com> wrote:

> Jay
> Did you try using text_en_splitting copied out of another release?
> Though if someone went to the trouble of removing it from the example,
> there could be something broken in it.
> Cheers -- Rick
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

Posted by Rick Leir <rl...@leirtech.com>.
Jay
Did you try using text_en_splitting copied out of another release? 
Though if someone went to the trouble of removing it from the example, there could be something broken in it. 
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com