You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Corthals <th...@klascement.net> on 2020/07/06 22:43:26 UTC

Tokenizing managed synonyms

Hi,

Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
some fields.

Best,

Thomas

Re: Tokenizing managed synonyms

Posted by Kayak28 <ka...@gmail.com>.
Hello, Solr Community:

Actually, you can set up a tokenizer for the managed synonyms.
But, the configuration is not on the reference guide, and I do not know how
to add a Tokenizer via API-call.
So, you might need to manually edit a JSON file below the config directory.


In the _schema_analysis_synonyms_<Name of Resource>.json under config
directory, you will see the JSON below.

{
  "responseHeader":{
    "status":0,
    "QTime":3},
  "synonymMappings":{
    "initArgs":{
      "ignoreCase":true,
      "format":"solr"},
    "initializedOn":"2014-12-16T22:44:05.33Z",
    "managedMap":{
      "GB":
        ["GiB",
         "Gigabyte"],
      "TV":
        ["Television"],
      "happy":
        ["glad",
         "joyful"]}}}


In order to add a tokenizer, under the "initArgs" key, you need to add the
following key-value data.
 "tokenizerFactory":"solr.<Name Of Tokenizer>Factory"

Eventually,  you will get the following JSON.
{ "responseHeader":{
  "status":0, "QTime":3},
  "synonymMappings":{ "
      initArgs":{
      "ignoreCase":true,
      "format":"solr",
      "tokenizerFactory":"solr.<Name Of Tokenizer>Factory"
   },
      "initializedOn":"2014-12-16T22:44:05.33Z",
     "managedMap":{
         "GB": ["GiB", "Gigabyte"],
         "TV": ["Television"],
         "happy": ["glad", "joyful"]}}}


I would like to add this configuration to Solr reference guide, but I have
not created a JIRA issue yet.


-- 

Sincerely,
Kaya
github: https://github.com/28kayak



2020年7月7日(火) 11:55 Koji Sekiguchi <ko...@rondhuit.com>:

> I think the question makes sense as SynonymGraphFilterFactory accepts
> tokenizerFactory,
> he asked the managed version of SynonymGraphFilter could accept it as well.
>
>
> https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter
>
> The answer seems to be NO.
>
> Koji
>
>
> On 2020/07/07 8:18, Erick Erickson wrote:
> > This question doesn’t really make sense. You don’t specify tokenizers on
> > filters, they’re specified at the _field_ level.
> >
> > You can certainly define as many field(type)s as you want, each with a
> different
> > analysis chain and those chains can be made up of whatever you want to
> use, and
> > there are lots of choices.
> >
> > If you are asking to do _additional_ tokenization on the output of a
> synonym
> > filter, no.
> >
> > Perhaps if you defined the problem you’re trying to solve we could make
> some
> > suggestions.
> >
> > Best,
> > Erick
> >
> >> On Jul 6, 2020, at 6:43 PM, Thomas Corthals <th...@klascement.net>
> wrote:
> >>
> >> Hi,
> >>
> >> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> >> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> >> some fields.
> >>
> >> Best,
> >>
> >> Thomas
> >
> >
>


<https://github.com/28kayak>

Re: Tokenizing managed synonyms

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
I think the question makes sense as SynonymGraphFilterFactory accepts tokenizerFactory,
he asked the managed version of SynonymGraphFilter could accept it as well.

https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter

The answer seems to be NO.

Koji


On 2020/07/07 8:18, Erick Erickson wrote:
> This question doesn’t really make sense. You don’t specify tokenizers on
> filters, they’re specified at the _field_ level.
> 
> You can certainly define as many field(type)s as you want, each with a different
> analysis chain and those chains can be made up of whatever you want to use, and
> there are lots of choices.
> 
> If you are asking to do _additional_ tokenization on the output of a synonym
> filter, no.
> 
> Perhaps if you defined the problem you’re trying to solve we could make some
> suggestions.
> 
> Best,
> Erick
> 
>> On Jul 6, 2020, at 6:43 PM, Thomas Corthals <th...@klascement.net> wrote:
>>
>> Hi,
>>
>> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
>> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
>> some fields.
>>
>> Best,
>>
>> Thomas
> 
> 

Re: Tokenizing managed synonyms

Posted by Erick Erickson <er...@gmail.com>.
This question doesn’t really make sense. You don’t specify tokenizers on
filters, they’re specified at the _field_ level.

You can certainly define as many field(type)s as you want, each with a different
analysis chain and those chains can be made up of whatever you want to use, and
there are lots of choices.

If you are asking to do _additional_ tokenization on the output of a synonym
filter, no.

Perhaps if you defined the problem you’re trying to solve we could make some
suggestions.

Best,
Erick

> On Jul 6, 2020, at 6:43 PM, Thomas Corthals <th...@klascement.net> wrote:
> 
> Hi,
> 
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
> 
> Best,
> 
> Thomas


Re: Tokenizing managed synonyms

Posted by Erick Erickson <er...@gmail.com>.
Please don’t hijack threads, start a new one when you switch topics.

> On Jul 6, 2020, at 6:52 PM, Stavros Macrakis <ma...@alum.mit.edu> wrote:
> 
> How can I search for a term *except *when it's part of certain phrases?
> 
> For example, I might want to find documents mentioning "pepper" where it is
> not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
> 
> It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
> OR "pepper sauce")] because that excludes all documents which mention
> "chili pepper" even if they *also* mention "black pepper" or the unmodified
> word "pepper". Maybe some way using synonyms?
> 
> Thanks!
> 
>             -s
> 
> On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals <th...@klascement.net>
> wrote:
> 
>> Hi,
>> 
>> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
>> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
>> some fields.
>> 
>> Best,
>> 
>> Thomas
>> 


Re: Search for term except within phrase

Posted by Emir Arnautović <em...@sematext.com>.
Hi Stavros,
I didn’t check what’s supported in ComplexPhraseQueryParser but that is wrapper around span queries, so you should be able to do what you need: https://lucene.apache.org/solr/guide/7_6/other-parsers.html#complex-phrase-query-parser <https://lucene.apache.org/solr/guide/7_6/other-parsers.html#complex-phrase-query-parser>

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 Jul 2020, at 03:11, Stavros Macrakis <ma...@alum.mit.edu> wrote:
> 
> (Sorry for sending this with the wrong subject earlier.)
> 
> How can I search for a term except when it's part of certain phrases?
> 
> For example, I might want to find documents mentioning "pepper" where it is
> not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
> 
> It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
> OR "pepper sauce")] because that excludes all documents which mention
> "chili pepper" even if they also mention "black pepper" or the unmodified
> word "pepper". Maybe some way using synonyms?
> 
> Thanks!
> 
>             -s


Search for term except within phrase

Posted by Stavros Macrakis <ma...@alum.mit.edu>.
(Sorry for sending this with the wrong subject earlier.)

How can I search for a term except when it's part of certain phrases?

For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".

It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they also mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?

Thanks!

             -s

Re: Tokenizing managed synonyms

Posted by Stavros Macrakis <ma...@alum.mit.edu>.
How can I search for a term *except *when it's part of certain phrases?

For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".

It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they *also* mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?

Thanks!

             -s

On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals <th...@klascement.net>
wrote:

> Hi,
>
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
>
> Best,
>
> Thomas
>