You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arturas Mazeika <ma...@gmail.com> on 2020/12/02 14:18:36 UTC

chaining charFilter

Hi Solr-Team,

The manual of charfilters says that one can chain them: (from
https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory
):

CharFilters can be chained like Token Filters and placed in front of a
Tokenizer. CharFilters can add, change, or remove characters while
preserving the original character offsets to support features like
highlighting.

I am trying to filter out some of the chars from some fields, so I can do
an efficient and effective faceting later. I tried to chaing charfilters
for that purpose:

<fieldType name="fcomp_type" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<!-- taking the filename: from the path-->
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(.*[/\\])([^/\\]+)$"   replacement="$2"/>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="([0-9\-]+)T([0-9\-]+)" replacement="$1 $2"/>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^a-zA-Z]+"            replacement=" "/>

<tokenizer  class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="fcomp"                   type="fcomp_type"   indexed="true"
stored="true"/>

but in schema definition I see only the last charfilter
[image: image.png]

Any clues why?

Cheers,
Arturas

Re: chaining charFilter

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Did you reload the core for it to notice the new schema? Or try creating a
new core from the same schema?

If it is a SolrCloud, you also have to upload the schema to the Zookeeper.

Regards,
   Alex.

On Wed, 2 Dec 2020 at 09:19, Arturas Mazeika <ma...@gmail.com> wrote:

> Hi Solr-Team,
>
> The manual of charfilters says that one can chain them: (from
> https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory
> ):
>
> CharFilters can be chained like Token Filters and placed in front of a
> Tokenizer. CharFilters can add, change, or remove characters while
> preserving the original character offsets to support features like
> highlighting.
>
> I am trying to filter out some of the chars from some fields, so I can do
> an efficient and effective faceting later. I tried to chaing charfilters
> for that purpose:
>
> <fieldType name="fcomp_type" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer>
> <!-- taking the filename: from the path-->
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(.*[/\\])([^/\\]+)$"   replacement="$2"/>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="([0-9\-]+)T([0-9\-]+)" replacement="$1 $2"/>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[^a-zA-Z]+"            replacement=" "/>
>
> <tokenizer  class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> <field name="fcomp"                   type="fcomp_type"   indexed="true"
> stored="true"/>
>
> but in schema definition I see only the last charfilter
> [image: image.png]
>
> Any clues why?
>
> Cheers,
> Arturas
>

Re: chaining charFilter

Posted by Arturas Mazeika <ma...@gmail.com>.
Hi Alex,
Hi Erick,

Thanks a lot for the prompt reply. Indeed, the functionality is completely
fine and checking the values with the analyzer gives results as expected. I
also checked the jira issue, nicely described.

Cheers,
Arturas

On Wed, Dec 2, 2020 at 7:23 PM Erick Erickson <er...@gmail.com>
wrote:

> Images are stripped by the mail server, so we can’t see the result.
>
> I looked at master and the admin UI has problems, I just
> raised a JIRA, see:
> https://issues.apache.org/jira/browse/SOLR-15024
>
> The _functionality_ is fine. If you go to the analysis page
> and enter values, you’ll see the transformations work. Although
> that screen doesn’t show the CharFitler transformations correctly,
> but the tokens at the end are chained.
>
> Best,
> Erick
>
> > On Dec 2, 2020, at 9:18 AM, Arturas Mazeika <ma...@gmail.com> wrote:
> >
> > Hi Solr-Team,
> >
> > The manual of charfilters says that one can chain them: (from
> https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory
> ):
> >
> > CharFilters can be chained like Token Filters and placed in front of a
> Tokenizer. CharFilters can add, change, or remove characters while
> preserving the original character offsets to support features like
> highlighting.
> >
> > I am trying to filter out some of the chars from some fields, so I can
> do an efficient and effective faceting later. I tried to chaing charfilters
> for that purpose:
> >
> > <fieldType name="fcomp_type" class="solr.TextField"
> positionIncrementGap="100">
> > <analyzer>
> > <!-- taking the filename: from the path-->
> > <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(.*[/\\])([^/\\]+)$"   replacement="$2"/>
> > <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="([0-9\-]+)T([0-9\-]+)" replacement="$1 $2"/>
> > <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[^a-zA-Z]+"            replacement=" "/>
> >
> > <tokenizer  class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> > <field name="fcomp"                   type="fcomp_type"   indexed="true"
> stored="true"/>
> >
> > but in schema definition I see only the last charfilter
> >
> >
> > Any clues why?
> >
> > Cheers,
> > Arturas
>
>

Re: chaining charFilter

Posted by Erick Erickson <er...@gmail.com>.
Images are stripped by the mail server, so we can’t see the result.

I looked at master and the admin UI has problems, I just
raised a JIRA, see:
https://issues.apache.org/jira/browse/SOLR-15024

The _functionality_ is fine. If you go to the analysis page
and enter values, you’ll see the transformations work. Although
that screen doesn’t show the CharFitler transformations correctly,
but the tokens at the end are chained.

Best,
Erick

> On Dec 2, 2020, at 9:18 AM, Arturas Mazeika <ma...@gmail.com> wrote:
> 
> Hi Solr-Team,
> 
> The manual of charfilters says that one can chain them: (from https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory):
> 
> CharFilters can be chained like Token Filters and placed in front of a Tokenizer. CharFilters can add, change, or remove characters while preserving the original character offsets to support features like highlighting.
> 
> I am trying to filter out some of the chars from some fields, so I can do an efficient and effective faceting later. I tried to chaing charfilters for that purpose:
> 
> <fieldType name="fcomp_type" class="solr.TextField" positionIncrementGap="100">
> <analyzer>
> <!-- taking the filename: from the path-->
> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(.*[/\\])([^/\\]+)$"   replacement="$2"/>
> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([0-9\-]+)T([0-9\-]+)" replacement="$1 $2"/>
> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z]+"            replacement=" "/>
> 
> <tokenizer  class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> <field name="fcomp"                   type="fcomp_type"   indexed="true" stored="true"/>
> 
> but in schema definition I see only the last charfilter 
> 
> 
> Any clues why? 
> 
> Cheers,
> Arturas