You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dmitry Kan <dm...@gmail.com> on 2011/05/03 11:51:45 UTC

stemming for English

Dear list,

In SOLR schema on the index side we use no stemming to support favor
wildcard search. On the query side of the index we use Porter stemming.

I have noticed the following issue: the term "pretty" gets stemmed to
"pretti" and thus not found.

What would be the approach to handle such situations, is going all the way
to modifying the Porter stemming source code the best choice?

-- 
Regards,

Dmitry Kan

Re: stemming for English

Posted by Dmitry Kan <dm...@gmail.com>.
Hi Robert,

Have you seen *any* growth?

We have once added a copy field for supporting leading wildcard and got our
index doubled (or something close).

On Tue, May 3, 2011 at 9:24 PM, Robert Petersen <ro...@buy.com> wrote:

> From what I have seen, adding a second field with the same terms as the
> first does *not* double your index size at all.
>
> -----Original Message-----
> From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
> Sent: Tuesday, May 03, 2011 4:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: stemming for English
>
> Yes, Ludovic. Thus effectively we get index doubled. Given the volume of
> data we store, we very carefully consider such cases, where the doubling of
> index is must.
>
> Dmitry
>
> On Tue, May 3, 2011 at 1:08 PM, lboutros <bo...@gmail.com> wrote:
>
> > Dmitry,
> >
> > I don't know any way to keep both stemming and consistent wildcard
> support
> > in the same field.
> > To me, you have to create 2 different fields.
> >
> > Ludovic.
> >
> > 2011/5/3 Dmitry Kan [via Lucene] <
> > ml-node+2893628-993677979-383657@n3.nabble.com>
> >
> > > Hi Ludovic,
> > >
> > > That's an option we had before we decided to go for a full-blown
> support
> > of
> > >
> > > wildcards.
> > >
> > > Do you know of a way to keep both stemming and consistent wildcard
> > support
> > > in the same field?`
> > >
> > > Dmitry
> > >
> > >
> >
> >
> > -----
> > Jouve
> > France.
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893652.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>



-- 
Regards,

Dmitry Kan

RE: stemming for English

Posted by Robert Petersen <ro...@buy.com>.
From what I have seen, adding a second field with the same terms as the first does *not* double your index size at all.

-----Original Message-----
From: Dmitry Kan [mailto:dmitry.kan@gmail.com] 
Sent: Tuesday, May 03, 2011 4:06 AM
To: solr-user@lucene.apache.org
Subject: Re: stemming for English

Yes, Ludovic. Thus effectively we get index doubled. Given the volume of
data we store, we very carefully consider such cases, where the doubling of
index is must.

Dmitry

On Tue, May 3, 2011 at 1:08 PM, lboutros <bo...@gmail.com> wrote:

> Dmitry,
>
> I don't know any way to keep both stemming and consistent wildcard support
> in the same field.
> To me, you have to create 2 different fields.
>
> Ludovic.
>
> 2011/5/3 Dmitry Kan [via Lucene] <
> ml-node+2893628-993677979-383657@n3.nabble.com>
>
> > Hi Ludovic,
> >
> > That's an option we had before we decided to go for a full-blown support
> of
> >
> > wildcards.
> >
> > Do you know of a way to keep both stemming and consistent wildcard
> support
> > in the same field?`
> >
> > Dmitry
> >
> >
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893652.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Dmitry Kan

Re: stemming for English

Posted by Dmitry Kan <dm...@gmail.com>.
Yes, Ludovic. Thus effectively we get index doubled. Given the volume of
data we store, we very carefully consider such cases, where the doubling of
index is must.

Dmitry

On Tue, May 3, 2011 at 1:08 PM, lboutros <bo...@gmail.com> wrote:

> Dmitry,
>
> I don't know any way to keep both stemming and consistent wildcard support
> in the same field.
> To me, you have to create 2 different fields.
>
> Ludovic.
>
> 2011/5/3 Dmitry Kan [via Lucene] <
> ml-node+2893628-993677979-383657@n3.nabble.com>
>
> > Hi Ludovic,
> >
> > That's an option we had before we decided to go for a full-blown support
> of
> >
> > wildcards.
> >
> > Do you know of a way to keep both stemming and consistent wildcard
> support
> > in the same field?`
> >
> > Dmitry
> >
> >
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893652.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Dmitry Kan

Re: stemming for English

Posted by lboutros <bo...@gmail.com>.
Dmitry,

I don't know any way to keep both stemming and consistent wildcard support
in the same field.
To me, you have to create 2 different fields.

Ludovic.

2011/5/3 Dmitry Kan [via Lucene] <
ml-node+2893628-993677979-383657@n3.nabble.com>

> Hi Ludovic,
>
> That's an option we had before we decided to go for a full-blown support of
>
> wildcards.
>
> Do you know of a way to keep both stemming and consistent wildcard support
> in the same field?`
>
> Dmitry
>
>


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: stemming for English

Posted by Dmitry Kan <dm...@gmail.com>.
Hi Ludovic,

That's an option we had before we decided to go for a full-blown support of
wildcards.

Do you know of a way to keep both stemming and consistent wildcard support
in the same field?`

Dmitry

On Tue, May 3, 2011 at 12:56 PM, lboutros <bo...@gmail.com> wrote:

> Hi,
>
> I think you have to use stemming on both side (index and query) if you
> really want to use stemming.
>
> Ludovic
>
> 2011/5/3 Dmitry Kan [via Lucene] <
> ml-node+2893599-894006307-383657@n3.nabble.com>
>
> > Dear list,
> >
> > In SOLR schema on the index side we use no stemming to support favor
> > wildcard search. On the query side of the index we use Porter stemming.
> >
> > I have noticed the following issue: the term "pretty" gets stemmed to
> > "pretti" and thus not found.
> >
> > What would be the approach to handle such situations, is going all the
> way
> > to modifying the Porter stemming source code the best choice?
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
> >
> > ------------------------------
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893599.html
> >  To start a new topic under Solr - User, email
> > ml-node+472068-1765922688-383657@n3.nabble.com
> > To unsubscribe from Solr - User, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=
> >.
> >
> >
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893611.html
> Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan

Re: stemming for English

Posted by lboutros <bo...@gmail.com>.
Hi,

I think you have to use stemming on both side (index and query) if you
really want to use stemming.

Ludovic

2011/5/3 Dmitry Kan [via Lucene] <
ml-node+2893599-894006307-383657@n3.nabble.com>

> Dear list,
>
> In SOLR schema on the index side we use no stemming to support favor
> wildcard search. On the query side of the index we use Porter stemming.
>
> I have noticed the following issue: the term "pretty" gets stemmed to
> "pretti" and thus not found.
>
> What would be the approach to handle such situations, is going all the way
> to modifying the Porter stemming source code the best choice?
>
> --
> Regards,
>
> Dmitry Kan
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893599.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383657@n3.nabble.com
> To unsubscribe from Solr - User, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>.
>
>


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/stemming-for-English-tp2893599p2893611.html
Sent from the Solr - User mailing list archive at Nabble.com.