You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Corthals <th...@klascement.net> on 2020/07/06 22:43:26 UTC
Tokenizing managed synonyms
Hi,
Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
some fields.
Best,
Thomas
Re: Tokenizing managed synonyms
Posted by Kayak28 <ka...@gmail.com>.
Hello, Solr Community:
Actually, you can set up a tokenizer for the managed synonyms.
But, the configuration is not on the reference guide, and I do not know how
to add a Tokenizer via API-call.
So, you might need to manually edit a JSON file below the config directory.
In the _schema_analysis_synonyms_<Name of Resource>.json under config
directory, you will see the JSON below.
{
"responseHeader":{
"status":0,
"QTime":3},
"synonymMappings":{
"initArgs":{
"ignoreCase":true,
"format":"solr"},
"initializedOn":"2014-12-16T22:44:05.33Z",
"managedMap":{
"GB":
["GiB",
"Gigabyte"],
"TV":
["Television"],
"happy":
["glad",
"joyful"]}}}
In order to add a tokenizer, under the "initArgs" key, you need to add the
following key-value data.
"tokenizerFactory":"solr.<Name Of Tokenizer>Factory"
Eventually, you will get the following JSON.
{ "responseHeader":{
"status":0, "QTime":3},
"synonymMappings":{ "
initArgs":{
"ignoreCase":true,
"format":"solr",
"tokenizerFactory":"solr.<Name Of Tokenizer>Factory"
},
"initializedOn":"2014-12-16T22:44:05.33Z",
"managedMap":{
"GB": ["GiB", "Gigabyte"],
"TV": ["Television"],
"happy": ["glad", "joyful"]}}}
I would like to add this configuration to Solr reference guide, but I have
not created a JIRA issue yet.
--
Sincerely,
Kaya
github: https://github.com/28kayak
2020年7月7日(火) 11:55 Koji Sekiguchi <ko...@rondhuit.com>:
> I think the question makes sense as SynonymGraphFilterFactory accepts
> tokenizerFactory,
> he asked the managed version of SynonymGraphFilter could accept it as well.
>
>
> https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter
>
> The answer seems to be NO.
>
> Koji
>
>
> On 2020/07/07 8:18, Erick Erickson wrote:
> > This question doesn’t really make sense. You don’t specify tokenizers on
> > filters, they’re specified at the _field_ level.
> >
> > You can certainly define as many field(type)s as you want, each with a
> different
> > analysis chain and those chains can be made up of whatever you want to
> use, and
> > there are lots of choices.
> >
> > If you are asking to do _additional_ tokenization on the output of a
> synonym
> > filter, no.
> >
> > Perhaps if you defined the problem you’re trying to solve we could make
> some
> > suggestions.
> >
> > Best,
> > Erick
> >
> >> On Jul 6, 2020, at 6:43 PM, Thomas Corthals <th...@klascement.net>
> wrote:
> >>
> >> Hi,
> >>
> >> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> >> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> >> some fields.
> >>
> >> Best,
> >>
> >> Thomas
> >
> >
>
<https://github.com/28kayak>
Re: Tokenizing managed synonyms
Posted by Koji Sekiguchi <ko...@rondhuit.com>.
I think the question makes sense as SynonymGraphFilterFactory accepts tokenizerFactory,
he asked the managed version of SynonymGraphFilter could accept it as well.
https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter
The answer seems to be NO.
Koji
On 2020/07/07 8:18, Erick Erickson wrote:
> This question doesn’t really make sense. You don’t specify tokenizers on
> filters, they’re specified at the _field_ level.
>
> You can certainly define as many field(type)s as you want, each with a different
> analysis chain and those chains can be made up of whatever you want to use, and
> there are lots of choices.
>
> If you are asking to do _additional_ tokenization on the output of a synonym
> filter, no.
>
> Perhaps if you defined the problem you’re trying to solve we could make some
> suggestions.
>
> Best,
> Erick
>
>> On Jul 6, 2020, at 6:43 PM, Thomas Corthals <th...@klascement.net> wrote:
>>
>> Hi,
>>
>> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
>> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
>> some fields.
>>
>> Best,
>>
>> Thomas
>
>
Re: Tokenizing managed synonyms
Posted by Erick Erickson <er...@gmail.com>.
This question doesn’t really make sense. You don’t specify tokenizers on
filters, they’re specified at the _field_ level.
You can certainly define as many field(type)s as you want, each with a different
analysis chain and those chains can be made up of whatever you want to use, and
there are lots of choices.
If you are asking to do _additional_ tokenization on the output of a synonym
filter, no.
Perhaps if you defined the problem you’re trying to solve we could make some
suggestions.
Best,
Erick
> On Jul 6, 2020, at 6:43 PM, Thomas Corthals <th...@klascement.net> wrote:
>
> Hi,
>
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
>
> Best,
>
> Thomas
Re: Tokenizing managed synonyms
Posted by Erick Erickson <er...@gmail.com>.
Please don’t hijack threads, start a new one when you switch topics.
> On Jul 6, 2020, at 6:52 PM, Stavros Macrakis <ma...@alum.mit.edu> wrote:
>
> How can I search for a term *except *when it's part of certain phrases?
>
> For example, I might want to find documents mentioning "pepper" where it is
> not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
>
> It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
> OR "pepper sauce")] because that excludes all documents which mention
> "chili pepper" even if they *also* mention "black pepper" or the unmodified
> word "pepper". Maybe some way using synonyms?
>
> Thanks!
>
> -s
>
> On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals <th...@klascement.net>
> wrote:
>
>> Hi,
>>
>> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
>> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
>> some fields.
>>
>> Best,
>>
>> Thomas
>>
Re: Search for term except within phrase
Posted by Emir Arnautović <em...@sematext.com>.
Hi Stavros,
I didn’t check what’s supported in ComplexPhraseQueryParser but that is wrapper around span queries, so you should be able to do what you need: https://lucene.apache.org/solr/guide/7_6/other-parsers.html#complex-phrase-query-parser <https://lucene.apache.org/solr/guide/7_6/other-parsers.html#complex-phrase-query-parser>
HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 7 Jul 2020, at 03:11, Stavros Macrakis <ma...@alum.mit.edu> wrote:
>
> (Sorry for sending this with the wrong subject earlier.)
>
> How can I search for a term except when it's part of certain phrases?
>
> For example, I might want to find documents mentioning "pepper" where it is
> not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
>
> It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
> OR "pepper sauce")] because that excludes all documents which mention
> "chili pepper" even if they also mention "black pepper" or the unmodified
> word "pepper". Maybe some way using synonyms?
>
> Thanks!
>
> -s
Search for term except within phrase
Posted by Stavros Macrakis <ma...@alum.mit.edu>.
(Sorry for sending this with the wrong subject earlier.)
How can I search for a term except when it's part of certain phrases?
For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they also mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?
Thanks!
-s
Re: Tokenizing managed synonyms
Posted by Stavros Macrakis <ma...@alum.mit.edu>.
How can I search for a term *except *when it's part of certain phrases?
For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they *also* mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?
Thanks!
-s
On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals <th...@klascement.net>
wrote:
> Hi,
>
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
>
> Best,
>
> Thomas
>