You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Bogdan Vatkov <bo...@gmail.com> on 2010/01/19 23:28:51 UTC

Unstemming after solr.PorterStemFilterFactory

Hi,

I am indexing with the solr.PorterStemFilterFactory included but then I need
to access the unstemmed versions of the terms, what would be the easiest way
to get the unstemmed version?
Thanks in advance.

Best regards,
Bogdan




-- 
Best regards,
Bogdan

Re: Unstemming after solr.PorterStemFilterFactory

Posted by Bogdan Vatkov <bo...@gmail.com>.

Thanks! It is good to know I did not do something in vаin :)

On Wed, Jan 20, 2010 at 6:54 PM, Erick Erickson <er...@gmail.com>wrote:

> Ah, OK. I take the "unnecessary" comment back. If you require
> the original form of the tokens (not just the original text), then you
> do have to do something to preserve them, so I think you're on
> the right track....
>
> FWIW
> Erick
>
> On Wed, Jan 20, 2010 at 9:38 AM, Bogdan Vatkov <bogdan.vatkov@gmail.com
> >wrote:
>
> > Hi Eric,
> >
> > I think I realize that and I am actually using this - I am using the
> > stemmed, cased etc. token from the stored "term vectors" and additionally
> I
> > am using the field values.
> > But the fields values are different from the tokens in the level of
> > granularity.
> > When I access the term vector for my field "body" I get the tokens:
> > "old", "man", "sea" (the rest is stopwords)
> > while if I use document.getter methods for my field I get the value of
> the
> > field "body", which is:
> > "The Old Man and the Sea"
> > But.. what I actually need is the original version of the tokens and not
> > the
> > field value itself, in that example I need:
> > "Old" "Man", "Sea"
> > and not
> > "The Old Man and the Sea"
> > that is why I had to do my version of that filter so that during tokens
> > transformation (stemming, lowercasing) I store a map of the filtered term
> > -to- original term.
> >
> > I am using Apache Mahout to read from Solr index (term vectors) and
> cluster
> > Solr documents based on these terms (tokens) and the clustering process
> > itself works with the stemmed, lowercased terms while at the end I want
> to
> > present the original terms - and the only way I found is by using this
> > stemmed term-to-original-token-map which I build during stemming.
> > Am I missing some existing method to access stored tokens before they get
> > stemmed?
> >
> > Best regards,
> > Bogdan
> >
> > On Wed, Jan 20, 2010 at 2:39 AM, Erick Erickson <erickerickson@gmail.com
> > >wrote:
> >
> > > This is completely unnecessary. Fields can be both indexed and
> > > stored, and the operations are orthogonal.
> > >
> > > That is, when you specify that a field is indexed, it is run through
> > > an analyzer and the *tokens* are indexed, after any
> > > stemming, casing, etc.
> > >
> > > Stored means that the original value, before any analysis
> > > whatsoever, is put in a completely separate location.
> > > It's only there for retrieval and display to the user. It's as if
> > > a copy of the original text was put into one place, and the
> > > tokens were put in another.
> > >
> > > Consider the problem of book titles. If I have a title "The Old
> > > Man and the Sea", I want to display that title as a result of
> > > searching for "old sea man". Rather than force the separate
> > > storage to be done programmatically, SOLR allows you to
> > > specify these two options. So if I specify indexing and storing,
> > > the tokens "old" "man" "sea" (assuming lowercasing,
> > > stopword removal, etc) are added to the searchable index.
> > > "The Old Man and the Sea" is copied somewhere else, and
> > > when you ask for the *value* of the field, you get "The Old Man
> > > and the Sea". This stored part of the index is never searched, it
> > > is solely there for retrieval/display.
> > >
> > > I'd really get a copy of the book, it'll save you lots of time and
> > > effort.
> > >
> > > HTH
> > > Erick
> > >
> > > On Tue, Jan 19, 2010 at 5:45 PM, Bogdan Vatkov <
> bogdan.vatkov@gmail.com
> > > >wrote:
> > >
> > > > I am using fields like:
> > > >  <field name="msg_body" type="body_text" termVectors="true"
> > > indexed="true"
> > > > stored="true"/>
> > > > which contain multi-line text, not just single strings, what does
> > "stored
> > > > values" mean?
> > > > I am relatively new to Solr
> > > >
> > > > I solved my issue by copy/pasting and enhancing
> > > > the SnowballPorterFilterFactory class by
> > > > creating SnowballPorterWithUnstemLowerCaseFilterFactory
> > > > I added lowercasing inside the factory since I need to capture the
> > > original
> > > > terms store them in a side file and only then lowercase and stem.
> > > >
> > > >    <fieldType name="body_text" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >      <analyzer type="index">
> > > >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > >        <filter class="solr.StopFilterFactory"
> > > >                ignoreCase="true"
> > > >                words="stopwords.txt"
> > > >                enablePositionIncrements="true"
> > > >                />
> > > >        <filter class="solr.WordDelimiterFilterFactory"
> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > > <!--        <filter class="solr.LowerCaseFilterFactory"/> -->
> > > > <!--        <filter class="solr.SnowballPorterFilterFactory"
> > > > language="English" protected="protwords.txt"/> -->
> > > >        <filter
> > > >
> > > >
> > >
> >
> class="org.bogdan.solr.analysis.SnowballPorterWithUnstemLowerCaseFilterFactory"
> > > > language="English" protected="protwords.txt"
> > unstemmed="unstemmed.txt"/>
> > > >      </analyzer>
> > > >
> > > > I was wondering if there is an easier way (without doing this custom
> > > filter
> > > > that I did).
> > > >
> > > > Best regards,
> > > > Bogdan
> > > >
> > > > On Wed, Jan 20, 2010 at 12:38 AM, Otis Gospodnetic <
> > > > otis_gospodnetic@yahoo.com> wrote:
> > > >
> > > > > Bogdan,
> > > > >
> > > > > You can get them from stored values of your fields, if you are
> > storing
> > > > > them.
> > > > >
> > > > > Otis
> > > > > --
> > > > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > > > >
> > > > >
> > > > >
> > > > > ----- Original Message ----
> > > > > > From: Bogdan Vatkov <bo...@gmail.com>
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Sent: Tue, January 19, 2010 5:28:51 PM
> > > > > > Subject: Unstemming after solr.PorterStemFilterFactory
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am indexing with the solr.PorterStemFilterFactory included but
> > then
> > > I
> > > > > need
> > > > > > to access the unstemmed versions of the terms, what would be the
> > > > easiest
> > > > > way
> > > > > > to get the unstemmed version?
> > > > > > Thanks in advance.
> > > > > >
> > > > > > Best regards,
> > > > > > Bogdan
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Bogdan
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Bogdan
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Bogdan
> >
>



-- 
Best regards,
Bogdan

Re: Unstemming after solr.PorterStemFilterFactory

Posted by Erick Erickson <er...@gmail.com>.

Ah, OK. I take the "unnecessary" comment back. If you require
the original form of the tokens (not just the original text), then you
do have to do something to preserve them, so I think you're on
the right track....

FWIW
Erick

On Wed, Jan 20, 2010 at 9:38 AM, Bogdan Vatkov <bo...@gmail.com>wrote:

> Hi Eric,
>
> I think I realize that and I am actually using this - I am using the
> stemmed, cased etc. token from the stored "term vectors" and additionally I
> am using the field values.
> But the fields values are different from the tokens in the level of
> granularity.
> When I access the term vector for my field "body" I get the tokens:
> "old", "man", "sea" (the rest is stopwords)
> while if I use document.getter methods for my field I get the value of the
> field "body", which is:
> "The Old Man and the Sea"
> But.. what I actually need is the original version of the tokens and not
> the
> field value itself, in that example I need:
> "Old" "Man", "Sea"
> and not
> "The Old Man and the Sea"
> that is why I had to do my version of that filter so that during tokens
> transformation (stemming, lowercasing) I store a map of the filtered term
> -to- original term.
>
> I am using Apache Mahout to read from Solr index (term vectors) and cluster
> Solr documents based on these terms (tokens) and the clustering process
> itself works with the stemmed, lowercased terms while at the end I want to
> present the original terms - and the only way I found is by using this
> stemmed term-to-original-token-map which I build during stemming.
> Am I missing some existing method to access stored tokens before they get
> stemmed?
>
> Best regards,
> Bogdan
>
> On Wed, Jan 20, 2010 at 2:39 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > This is completely unnecessary. Fields can be both indexed and
> > stored, and the operations are orthogonal.
> >
> > That is, when you specify that a field is indexed, it is run through
> > an analyzer and the *tokens* are indexed, after any
> > stemming, casing, etc.
> >
> > Stored means that the original value, before any analysis
> > whatsoever, is put in a completely separate location.
> > It's only there for retrieval and display to the user. It's as if
> > a copy of the original text was put into one place, and the
> > tokens were put in another.
> >
> > Consider the problem of book titles. If I have a title "The Old
> > Man and the Sea", I want to display that title as a result of
> > searching for "old sea man". Rather than force the separate
> > storage to be done programmatically, SOLR allows you to
> > specify these two options. So if I specify indexing and storing,
> > the tokens "old" "man" "sea" (assuming lowercasing,
> > stopword removal, etc) are added to the searchable index.
> > "The Old Man and the Sea" is copied somewhere else, and
> > when you ask for the *value* of the field, you get "The Old Man
> > and the Sea". This stored part of the index is never searched, it
> > is solely there for retrieval/display.
> >
> > I'd really get a copy of the book, it'll save you lots of time and
> > effort.
> >
> > HTH
> > Erick
> >
> > On Tue, Jan 19, 2010 at 5:45 PM, Bogdan Vatkov <bogdan.vatkov@gmail.com
> > >wrote:
> >
> > > I am using fields like:
> > >  <field name="msg_body" type="body_text" termVectors="true"
> > indexed="true"
> > > stored="true"/>
> > > which contain multi-line text, not just single strings, what does
> "stored
> > > values" mean?
> > > I am relatively new to Solr
> > >
> > > I solved my issue by copy/pasting and enhancing
> > > the SnowballPorterFilterFactory class by
> > > creating SnowballPorterWithUnstemLowerCaseFilterFactory
> > > I added lowercasing inside the factory since I need to capture the
> > original
> > > terms store them in a side file and only then lowercase and stem.
> > >
> > >    <fieldType name="body_text" class="solr.TextField"
> > > positionIncrementGap="100">
> > >      <analyzer type="index">
> > >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >        <filter class="solr.StopFilterFactory"
> > >                ignoreCase="true"
> > >                words="stopwords.txt"
> > >                enablePositionIncrements="true"
> > >                />
> > >        <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > <!--        <filter class="solr.LowerCaseFilterFactory"/> -->
> > > <!--        <filter class="solr.SnowballPorterFilterFactory"
> > > language="English" protected="protwords.txt"/> -->
> > >        <filter
> > >
> > >
> >
> class="org.bogdan.solr.analysis.SnowballPorterWithUnstemLowerCaseFilterFactory"
> > > language="English" protected="protwords.txt"
> unstemmed="unstemmed.txt"/>
> > >      </analyzer>
> > >
> > > I was wondering if there is an easier way (without doing this custom
> > filter
> > > that I did).
> > >
> > > Best regards,
> > > Bogdan
> > >
> > > On Wed, Jan 20, 2010 at 12:38 AM, Otis Gospodnetic <
> > > otis_gospodnetic@yahoo.com> wrote:
> > >
> > > > Bogdan,
> > > >
> > > > You can get them from stored values of your fields, if you are
> storing
> > > > them.
> > > >
> > > > Otis
> > > > --
> > > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > > >
> > > >
> > > >
> > > > ----- Original Message ----
> > > > > From: Bogdan Vatkov <bo...@gmail.com>
> > > > > To: solr-user@lucene.apache.org
> > > > > Sent: Tue, January 19, 2010 5:28:51 PM
> > > > > Subject: Unstemming after solr.PorterStemFilterFactory
> > > > >
> > > > > Hi,
> > > > >
> > > > > I am indexing with the solr.PorterStemFilterFactory included but
> then
> > I
> > > > need
> > > > > to access the unstemmed versions of the terms, what would be the
> > > easiest
> > > > way
> > > > > to get the unstemmed version?
> > > > > Thanks in advance.
> > > > >
> > > > > Best regards,
> > > > > Bogdan
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Bogdan
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Bogdan
> > >
> >
>
>
>
> --
> Best regards,
> Bogdan
>

Re: Unstemming after solr.PorterStemFilterFactory

Posted by Bogdan Vatkov <bo...@gmail.com>.

Hi Eric,

I think I realize that and I am actually using this - I am using the
stemmed, cased etc. token from the stored "term vectors" and additionally I
am using the field values.
But the fields values are different from the tokens in the level of
granularity.
When I access the term vector for my field "body" I get the tokens:
"old", "man", "sea" (the rest is stopwords)
while if I use document.getter methods for my field I get the value of the
field "body", which is:
"The Old Man and the Sea"
But.. what I actually need is the original version of the tokens and not the
field value itself, in that example I need:
"Old" "Man", "Sea"
and not
"The Old Man and the Sea"
that is why I had to do my version of that filter so that during tokens
transformation (stemming, lowercasing) I store a map of the filtered term
-to- original term.

I am using Apache Mahout to read from Solr index (term vectors) and cluster
Solr documents based on these terms (tokens) and the clustering process
itself works with the stemmed, lowercased terms while at the end I want to
present the original terms - and the only way I found is by using this
stemmed term-to-original-token-map which I build during stemming.
Am I missing some existing method to access stored tokens before they get
stemmed?

Best regards,
Bogdan

On Wed, Jan 20, 2010 at 2:39 AM, Erick Erickson <er...@gmail.com>wrote:

> This is completely unnecessary. Fields can be both indexed and
> stored, and the operations are orthogonal.
>
> That is, when you specify that a field is indexed, it is run through
> an analyzer and the *tokens* are indexed, after any
> stemming, casing, etc.
>
> Stored means that the original value, before any analysis
> whatsoever, is put in a completely separate location.
> It's only there for retrieval and display to the user. It's as if
> a copy of the original text was put into one place, and the
> tokens were put in another.
>
> Consider the problem of book titles. If I have a title "The Old
> Man and the Sea", I want to display that title as a result of
> searching for "old sea man". Rather than force the separate
> storage to be done programmatically, SOLR allows you to
> specify these two options. So if I specify indexing and storing,
> the tokens "old" "man" "sea" (assuming lowercasing,
> stopword removal, etc) are added to the searchable index.
> "The Old Man and the Sea" is copied somewhere else, and
> when you ask for the *value* of the field, you get "The Old Man
> and the Sea". This stored part of the index is never searched, it
> is solely there for retrieval/display.
>
> I'd really get a copy of the book, it'll save you lots of time and
> effort.
>
> HTH
> Erick
>
> On Tue, Jan 19, 2010 at 5:45 PM, Bogdan Vatkov <bogdan.vatkov@gmail.com
> >wrote:
>
> > I am using fields like:
> >  <field name="msg_body" type="body_text" termVectors="true"
> indexed="true"
> > stored="true"/>
> > which contain multi-line text, not just single strings, what does "stored
> > values" mean?
> > I am relatively new to Solr
> >
> > I solved my issue by copy/pasting and enhancing
> > the SnowballPorterFilterFactory class by
> > creating SnowballPorterWithUnstemLowerCaseFilterFactory
> > I added lowercasing inside the factory since I need to capture the
> original
> > terms store them in a side file and only then lowercase and stem.
> >
> >    <fieldType name="body_text" class="solr.TextField"
> > positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >        <filter class="solr.StopFilterFactory"
> >                ignoreCase="true"
> >                words="stopwords.txt"
> >                enablePositionIncrements="true"
> >                />
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > <!--        <filter class="solr.LowerCaseFilterFactory"/> -->
> > <!--        <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/> -->
> >        <filter
> >
> >
> class="org.bogdan.solr.analysis.SnowballPorterWithUnstemLowerCaseFilterFactory"
> > language="English" protected="protwords.txt" unstemmed="unstemmed.txt"/>
> >      </analyzer>
> >
> > I was wondering if there is an easier way (without doing this custom
> filter
> > that I did).
> >
> > Best regards,
> > Bogdan
> >
> > On Wed, Jan 20, 2010 at 12:38 AM, Otis Gospodnetic <
> > otis_gospodnetic@yahoo.com> wrote:
> >
> > > Bogdan,
> > >
> > > You can get them from stored values of your fields, if you are storing
> > > them.
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: Bogdan Vatkov <bo...@gmail.com>
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Tue, January 19, 2010 5:28:51 PM
> > > > Subject: Unstemming after solr.PorterStemFilterFactory
> > > >
> > > > Hi,
> > > >
> > > > I am indexing with the solr.PorterStemFilterFactory included but then
> I
> > > need
> > > > to access the unstemmed versions of the terms, what would be the
> > easiest
> > > way
> > > > to get the unstemmed version?
> > > > Thanks in advance.
> > > >
> > > > Best regards,
> > > > Bogdan
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Bogdan
> > >
> > >
> >
> >
> > --
> > Best regards,
> > Bogdan
> >
>



-- 
Best regards,
Bogdan

Re: Unstemming after solr.PorterStemFilterFactory

Posted by Erick Erickson <er...@gmail.com>.

This is completely unnecessary. Fields can be both indexed and
stored, and the operations are orthogonal.

That is, when you specify that a field is indexed, it is run through
an analyzer and the *tokens* are indexed, after any
stemming, casing, etc.

Stored means that the original value, before any analysis
whatsoever, is put in a completely separate location.
It's only there for retrieval and display to the user. It's as if
a copy of the original text was put into one place, and the
tokens were put in another.

Consider the problem of book titles. If I have a title "The Old
Man and the Sea", I want to display that title as a result of
searching for "old sea man". Rather than force the separate
storage to be done programmatically, SOLR allows you to
specify these two options. So if I specify indexing and storing,
the tokens "old" "man" "sea" (assuming lowercasing,
stopword removal, etc) are added to the searchable index.
"The Old Man and the Sea" is copied somewhere else, and
when you ask for the *value* of the field, you get "The Old Man
and the Sea". This stored part of the index is never searched, it
is solely there for retrieval/display.

I'd really get a copy of the book, it'll save you lots of time and
effort.

HTH
Erick

On Tue, Jan 19, 2010 at 5:45 PM, Bogdan Vatkov <bo...@gmail.com>wrote:

> I am using fields like:
>  <field name="msg_body" type="body_text" termVectors="true" indexed="true"
> stored="true"/>
> which contain multi-line text, not just single strings, what does "stored
> values" mean?
> I am relatively new to Solr
>
> I solved my issue by copy/pasting and enhancing
> the SnowballPorterFilterFactory class by
> creating SnowballPorterWithUnstemLowerCaseFilterFactory
> I added lowercasing inside the factory since I need to capture the original
> terms store them in a side file and only then lowercase and stem.
>
>    <fieldType name="body_text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> <!--        <filter class="solr.LowerCaseFilterFactory"/> -->
> <!--        <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/> -->
>        <filter
>
> class="org.bogdan.solr.analysis.SnowballPorterWithUnstemLowerCaseFilterFactory"
> language="English" protected="protwords.txt" unstemmed="unstemmed.txt"/>
>      </analyzer>
>
> I was wondering if there is an easier way (without doing this custom filter
> that I did).
>
> Best regards,
> Bogdan
>
> On Wed, Jan 20, 2010 at 12:38 AM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
>
> > Bogdan,
> >
> > You can get them from stored values of your fields, if you are storing
> > them.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> > > From: Bogdan Vatkov <bo...@gmail.com>
> > > To: solr-user@lucene.apache.org
> > > Sent: Tue, January 19, 2010 5:28:51 PM
> > > Subject: Unstemming after solr.PorterStemFilterFactory
> > >
> > > Hi,
> > >
> > > I am indexing with the solr.PorterStemFilterFactory included but then I
> > need
> > > to access the unstemmed versions of the terms, what would be the
> easiest
> > way
> > > to get the unstemmed version?
> > > Thanks in advance.
> > >
> > > Best regards,
> > > Bogdan
> > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Bogdan
> >
> >
>
>
> --
> Best regards,
> Bogdan
>

Re: Unstemming after solr.PorterStemFilterFactory

Posted by Bogdan Vatkov <bo...@gmail.com>.

I am using fields like:
  <field name="msg_body" type="body_text" termVectors="true" indexed="true"
stored="true"/>
which contain multi-line text, not just single strings, what does "stored
values" mean?
I am relatively new to Solr

I solved my issue by copy/pasting and enhancing
the SnowballPorterFilterFactory class by
creating SnowballPorterWithUnstemLowerCaseFilterFactory
I added lowercasing inside the factory since I need to capture the original
terms store them in a side file and only then lowercase and stem.

    <fieldType name="body_text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<!--        <filter class="solr.LowerCaseFilterFactory"/> -->
<!--        <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/> -->
        <filter
class="org.bogdan.solr.analysis.SnowballPorterWithUnstemLowerCaseFilterFactory"
language="English" protected="protwords.txt" unstemmed="unstemmed.txt"/>
      </analyzer>

I was wondering if there is an easier way (without doing this custom filter
that I did).

Best regards,
Bogdan

On Wed, Jan 20, 2010 at 12:38 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Bogdan,
>
> You can get them from stored values of your fields, if you are storing
> them.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
> > From: Bogdan Vatkov <bo...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Tue, January 19, 2010 5:28:51 PM
> > Subject: Unstemming after solr.PorterStemFilterFactory
> >
> > Hi,
> >
> > I am indexing with the solr.PorterStemFilterFactory included but then I
> need
> > to access the unstemmed versions of the terms, what would be the easiest
> way
> > to get the unstemmed version?
> > Thanks in advance.
> >
> > Best regards,
> > Bogdan
> >
> >
> >
> >
> > --
> > Best regards,
> > Bogdan
>
>


-- 
Best regards,
Bogdan

Re: Unstemming after solr.PorterStemFilterFactory

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Bogdan,

You can get them from stored values of your fields, if you are storing them.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Bogdan Vatkov <bo...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, January 19, 2010 5:28:51 PM
> Subject: Unstemming after solr.PorterStemFilterFactory
> 
> Hi,
> 
> I am indexing with the solr.PorterStemFilterFactory included but then I need
> to access the unstemmed versions of the terms, what would be the easiest way
> to get the unstemmed version?
> Thanks in advance.
> 
> Best regards,
> Bogdan
> 
> 
> 
> 
> -- 
> Best regards,
> Bogdan