You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Parsa Ghaffari <pa...@gmail.com> on 2010/11/14 13:39:07 UTC
Solr TermsComponent: space in term
Hi folks,
I'm using Solr 1.4.1 and I'm willing to use TermsComponent for AutoComplete.
The problem is, I can't get it to match strings with spaces in them. So to
say,
terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json
matches all strings starting with "david" but if I change it to:
terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
it doesn't match all strings starting with "david ". Is it meant to be that
way? If so, are n-grams the way to go? And does anybody know if
TermsComponent is implementing Tries or DAWGs or Raddix trees and if it's
efficient?
Cheers,
Parsa
Re: Solr TermsComponent: space in term
Posted by aniljayanti <an...@gmail.com>.
Hi
Im working on autocompelte functionality in solr. can u suggest me the
required configurations in schema.xml and solrconfig.xml for doing
autocomplete in solr ??
thanks in advance,
Anil
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p3998755.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr TermsComponent: space in term
Posted by Ahmet Arslan <io...@yahoo.com>.
You need to remove EdgeNGramFilterFactory from your analyzer chain.
--- On Thu, 3/3/11, shrinath.m <sh...@webyog.com> wrote:
> From: shrinath.m <sh...@webyog.com>
> Subject: Re: Solr TermsComponent: space in term
> To: solr-user@lucene.apache.org
> Date: Thursday, March 3, 2011, 1:41 PM
>
> Markus Jelsma-2 wrote:
> >
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
> >
> well, thank you Markus,
>
> Now My schema has the following :
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> if I run a query like this :
>
> http://localhost:8983/solr/select?rows=0&q=c&facet=true&facet.field=text&facet.mincount=1&facet.prefix=com
>
> I get output saying :
> ....
>
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
>
> ....
>
> how do I restrict it to only those words present in the
> documents and not
> something like "compliance w" ?
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624547.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>
Re: Solr TermsComponent: space in term
Posted by "shrinath.m" <sh...@webyog.com>.
Markus Jelsma-2 wrote:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
well, thank you Markus,
Now My schema has the following :
if I run a query like this :
http://localhost:8983/solr/select?rows=0&q=c&facet=true&facet.field=text&facet.mincount=1&facet.prefix=com
I get output saying :
....
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
....
how do I restrict it to only those words present in the documents and not
something like "compliance w" ?
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624547.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr TermsComponent: space in term
Posted by Markus Jelsma <ma...@openindex.io>.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
On Thursday 03 March 2011 12:15:07 shrinath.m wrote:
> iorixxx wrote:
> > TermsComponent operates on indexed terms. One way to achieve multi-word
> > suggestions is to use ShingleFilterFactory at index time.
>
> Thank you @iorixxx.
> Could you point me where I can find a good docs on how to do this ?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp189
> 8889p2624429.html Sent from the Solr - User mailing list archive at
> Nabble.com.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: Solr TermsComponent: space in term
Posted by "shrinath.m" <sh...@webyog.com>.
iorixxx wrote:
>
> TermsComponent operates on indexed terms. One way to achieve multi-word
> suggestions is to use ShingleFilterFactory at index time.
>
Thank you @iorixxx.
Could you point me where I can find a good docs on how to do this ?
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624429.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr TermsComponent: space in term
Posted by Ahmet Arslan <io...@yahoo.com>.
> Is there no way to achieve what the Op
> had to say ?
>
TermsComponent operates on indexed terms. One way to achieve multi-word suggestions is to use ShingleFilterFactory at index time.
Re: Solr TermsComponent: space in term
Posted by "shrinath.m" <sh...@webyog.com>.
why was this thread left unanswered ? Is there no way to achieve what the Op
had to say ?
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624203.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr TermsComponent: space in term
Posted by Parsa Ghaffari <pa...@gmail.com>.
Alphanumeric + "_" + "%" + "."
So to say: "John_Smith", "John Smith", "John_B._Smith" and "John 44 Smith"
are all possible values.
On Sun, Nov 14, 2010 at 11:46 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>
> --- On Sun, 11/14/10, Parsa Ghaffari <pa...@gmail.com> wrote:
>
> > From: Parsa Ghaffari <pa...@gmail.com>
> > Subject: Re: Solr TermsComponent: space in term
> > To: solr-user@lucene.apache.org
> > Date: Sunday, November 14, 2010, 5:06 PM
> > Hi Ahmet,
> >
> > This is the fieldType for "name":
> >
> > <fieldType name="textgen"
> > class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> > <filter
> > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"
> > />
> > <filter
> > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1"
> > catenateWords="1"
> > catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="0"/>
> > <filter
> > class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> > <filter
> > class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> > <filter
> > class="solr.StopFilterFactory"
> >
> > ignoreCase="true"
> >
> > words="stopwords.txt"
> >
> > enablePositionIncrements="true"
> >
> > />
> > <filter
> > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1"
> > catenateWords="0"
> > catenateNumbers="0" catenateAll="0"
> > splitOnCaseChange="0"/>
> > <filter
> > class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> >
> > and:
> >
> > <field name="name" type="textgen" indexed="true"
> > stored="true"/>
> >
> > there's no ShingleFilterFactory. And also after changing
> > parameters in the
> > schema, should one re-index the table?
>
> Yes yes, re-index and restart servlet container is required. What kind of
> values does name field take? Does it contains punctuations? Can you give
> some examples of that field's values?
>
>
>
>
--
Parsa B. Ghaffari
Re: Solr TermsComponent: space in term
Posted by Ahmet Arslan <io...@yahoo.com>.
--- On Sun, 11/14/10, Parsa Ghaffari <pa...@gmail.com> wrote:
> From: Parsa Ghaffari <pa...@gmail.com>
> Subject: Re: Solr TermsComponent: space in term
> To: solr-user@lucene.apache.org
> Date: Sunday, November 14, 2010, 5:06 PM
> Hi Ahmet,
>
> This is the fieldType for "name":
>
> <fieldType name="textgen"
> class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
> <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"
> />
> <filter
> class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="0"/>
> <filter
> class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
> <filter
> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter
> class="solr.StopFilterFactory"
>
> ignoreCase="true"
>
> words="stopwords.txt"
>
> enablePositionIncrements="true"
>
> />
> <filter
> class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="0"/>
> <filter
> class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
>
> and:
>
> <field name="name" type="textgen" indexed="true"
> stored="true"/>
>
> there's no ShingleFilterFactory. And also after changing
> parameters in the
> schema, should one re-index the table?
Yes yes, re-index and restart servlet container is required. What kind of values does name field take? Does it contains punctuations? Can you give some examples of that field's values?
Re: Solr TermsComponent: space in term
Posted by Parsa Ghaffari <pa...@gmail.com>.
Hi Ahmet,
This is the fieldType for "name":
<fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
and:
<field name="name" type="textgen" indexed="true" stored="true"/>
there's no ShingleFilterFactory. And also after changing parameters in the
schema, should one re-index the table?
On Sun, Nov 14, 2010 at 10:32 PM, Ahmet Arslan <io...@yahoo.com> wrote:
> > I'm using Solr 1.4.1 and I'm willing to use TermsComponent
> > for AutoComplete.
> > The problem is, I can't get it to match strings with spaces
> > in them. So to
> > say,
> >
> >
> terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json
> >
> > matches all strings starting with "david" but if I change
> > it to:
> >
> >
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
> >
> > it doesn't match all strings starting with "david ". Is it
> > meant to be that
> > way?
>
> This is about fielyType of name? What is it? If it does have
> ShingleFilterFactory in it, then this is expected.
>
>
>
>
--
Parsa B. Ghaffari
Re: Solr TermsComponent: space in term
Posted by Ahmet Arslan <io...@yahoo.com>.
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
>
> it doesn't match all strings starting with "david ". Is it
> meant to be that
> way?
This is about fielyType of name field. What is it? If it does have ShingleFilterFactory in it, then this is expected.
Re: Solr TermsComponent: space in term
Posted by Ahmet Arslan <io...@yahoo.com>.
> I'm using Solr 1.4.1 and I'm willing to use TermsComponent
> for AutoComplete.
> The problem is, I can't get it to match strings with spaces
> in them. So to
> say,
>
> terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json
>
> matches all strings starting with "david" but if I change
> it to:
>
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
>
> it doesn't match all strings starting with "david ". Is it
> meant to be that
> way?
This is about fielyType of name? What is it? If it does have ShingleFilterFactory in it, then this is expected.