You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paras Lehana <pa...@indiamart.com> on 2020/02/25 10:28:58 UTC

Re: Is it possible to add stemming in a text_exact field

Hi Dhanesh,

Use KeywordRepeatFilterFactory
<https://lucene.apache.org/solr/guide/8_4/language-analysis.html#keywordrepeatfilterfactory>.
It will emit each token twice and marking one of them as KEYWORD so
stemming won't work on that token. Use RemoveDuplicates to remove the
duplicates after this.

On Fri, 24 Jan 2020 at 17:13, Lucky Sharma <go...@gmail.com> wrote:

> Hi Dhanesh,
> I have also encountered the problem long back when we have 'skimmed milk'
> and need to search for 'skim milk', for that we have written one filter,
> such that we can customize it, and then use KStemmer, then apply the custom
> ConcatPhraseFilterFactory.
>
> You can use the link mentioned below to review:
> https://github.com/MighTguY/solr-extensions
>
> Regards,
> Lucky Sharma
>
> On Thu, 23 Jan, 2020, 8:58 pm Alessandro Benedetti, <a....@sease.io>
> wrote:
>
> > Edward is correct, furthermore using a stemmer in an analysis chain that
> > don't tokenise is going to work just for single term queries and single
> > term field values...
> > Not sure it was intended ...
> >
> > Cheers
> >
> >
> > --------------------------
> > Alessandro Benedetti
> > Search Consultant, R&D Software Engineer, Director
> > www.sease.io
> >
> >
> > On Wed, 22 Jan 2020 at 16:26, Edward Ribeiro <ed...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > One possible solution would be to create a second field (e.g.,
> > > text_general) that uses DefaultTokenizer, or other tokenizer that
> breaks
> > > the string into tokens, and use a copyField to copy the content from
> > > text_exact to text_general. Then, you can use edismax parser to search
> > both
> > > fields, but giving text_exact a higher boost (qf=text_exact^5
> > > text_general). In this case, both fields should be indexed, but only
> one
> > > needs to be stored.
> > >
> > > Edward
> > >
> > > On Wed, Jan 22, 2020 at 10:34 AM Dhanesh Radhakrishnan <
> > dhanesh@hifx.co.in
> > > >
> > > wrote:
> > >
> > > > Hello,
> > > > I'm facing an issue with stemming.
> > > > My search query is "restaurant dubai" and returns  results.
> > > > If I search "restaurants dubai" it returns no data.
> > > >
> > > > How to stem this keyword "restaurant dubai" with "restaurants dubai"
> ?
> > > >
> > > > I'm using a text exact field for search.
> > > >
> > > > <field name="business_locality" type="text_exact" required="true"
> > > > multiValued="true" omitNorms="false"
> omitTermFreqAndPositions="false"/>
> > > >
> > > > Here is the field definition
> > > >
> > > >     <fieldType name="text_exact" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >         <analyzer type="index">
> > > >            <tokenizer class="solr.KeywordTokenizerFactory" />
> > > >            <filter class="solr.LowerCaseFilterFactory" />
> > > >            <filter class="solr.TrimFilterFactory" />
> > > >            <filter class="solr.PorterStemFilterFactory"/>
> > > >         </analyzer>
> > > >         <analyzer type="query">
> > > >           <tokenizer class="solr.KeywordTokenizerFactory" />
> > > >           <filter class="solr.LowerCaseFilterFactory" />
> > > >           <filter class="solr.TrimFilterFactory" />
> > > >           <filter class="solr.PorterStemFilterFactory"/>
> > > >        </analyzer>
> > > >     </fieldType>
> > > >
> > > > Is there any solutions without changing the tokenizer class.
> > > >
> > > >
> > > >
> > > >
> > > > Dhanesh S.R
> > > >
> > > > --
> > > > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd.
> Its
> > > > content are confidential to the intended recipient. If you are not
> the
> > > > intended recipient, be advised that you have received this e-mail in
> > > error
> > > > and that any use, dissemination, forwarding, printing or copying of
> > this
> > > > e-mail is strictly prohibited. It may not be disclosed to or used by
> > > > anyone
> > > > other than its intended recipient, nor may it be copied in any way.
> If
> > > > received in error, please email a reply to the sender, then delete it
> > > from
> > > > your system.
> > > >
> > > > Although this e-mail has been scanned for viruses, HiFX
> > > > cannot ultimately accept any responsibility for viruses and it is
> your
> > > > responsibility to scan attachments (if any).
> > > >
> > > > ​Before you print this email
> > > > or attachments, please consider the negative environmental impacts
> > > > associated with printing.
> > > >
> > >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>