You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Parsa Ghaffari <pa...@gmail.com> on 2010/11/14 13:39:07 UTC

Solr TermsComponent: space in term

Hi folks,

I'm using Solr 1.4.1 and I'm willing to use TermsComponent for AutoComplete.
The problem is, I can't get it to match strings with spaces in them. So to
say,

terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json

matches all strings starting with "david" but if I change it to:

terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json

it doesn't match all strings starting with "david ". Is it meant to be that
way? If so, are n-grams the way to go? And does anybody know if
TermsComponent is implementing Tries or DAWGs or Raddix trees and if it's
efficient?

Cheers,
Parsa

Re: Solr TermsComponent: space in term

Posted by aniljayanti <an...@gmail.com>.
Hi 

Im working on autocompelte functionality in solr. can u suggest me the
required configurations in schema.xml and solrconfig.xml for doing
autocomplete in solr ??

thanks in advance,

Anil




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p3998755.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TermsComponent: space in term

Posted by Ahmet Arslan <io...@yahoo.com>.

You need to remove EdgeNGramFilterFactory from your analyzer chain.



--- On Thu, 3/3/11, shrinath.m <sh...@webyog.com> wrote:

> From: shrinath.m <sh...@webyog.com>
> Subject: Re: Solr TermsComponent: space in term
> To: solr-user@lucene.apache.org
> Date: Thursday, March 3, 2011, 1:41 PM
> 
> Markus Jelsma-2 wrote:
> > 
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
> > 
> well, thank you Markus, 
> 
> Now My schema has the following : 
> 
> 
>             
>                 
>                 
>                 
>                 
>         
>                 
>                 
>                 
>             
>         
> 
> if I run a query like this : 
> 
> http://localhost:8983/solr/select?rows=0&q=c&facet=true&facet.field=text&facet.mincount=1&facet.prefix=com
> 
> I get output saying : 
> ....
> 
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 
> ....
> 
> how do I restrict it to only those words present in the
> documents and not
> something like "compliance w" ?
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624547.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


      

Re: Solr TermsComponent: space in term

Posted by "shrinath.m" <sh...@webyog.com>.
Markus Jelsma-2 wrote:
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
> 
well, thank you Markus, 

Now My schema has the following : 


            
                
                
                
                
        
                
                
                
            
        

if I run a query like this : 

http://localhost:8983/solr/select?rows=0&q=c&facet=true&facet.field=text&facet.mincount=1&facet.prefix=com

I get output saying : 
....

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

....

how do I restrict it to only those words present in the documents and not
something like "compliance w" ?


--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624547.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TermsComponent: space in term

Posted by Markus Jelsma <ma...@openindex.io>.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

On Thursday 03 March 2011 12:15:07 shrinath.m wrote:
> iorixxx wrote:
> > TermsComponent operates on indexed terms. One way to achieve multi-word
> > suggestions is to use ShingleFilterFactory at index time.
> 
> Thank you @iorixxx.
> Could you point me where I can find a good docs on how to do this ?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp189
> 8889p2624429.html Sent from the Solr - User mailing list archive at
> Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Solr TermsComponent: space in term

Posted by "shrinath.m" <sh...@webyog.com>.
iorixxx wrote:
> 
> TermsComponent operates on indexed terms. One way to achieve multi-word
> suggestions is to use ShingleFilterFactory at index time.
> 

Thank you @iorixxx.
Could you point me where I can find a good docs on how to do this ?  

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624429.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TermsComponent: space in term

Posted by Ahmet Arslan <io...@yahoo.com>.
> Is there no way to achieve what the Op
> had to say ?
> 

TermsComponent operates on indexed terms. One way to achieve multi-word suggestions is to use ShingleFilterFactory at index time.


      

Re: Solr TermsComponent: space in term

Posted by "shrinath.m" <sh...@webyog.com>.
why was this thread left unanswered ? Is there no way to achieve what the Op
had to say ?

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624203.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TermsComponent: space in term

Posted by Parsa Ghaffari <pa...@gmail.com>.
Alphanumeric + "_" + "%" + "."

So to say: "John_Smith", "John Smith", "John_B._Smith" and "John 44 Smith"
are all possible values.

On Sun, Nov 14, 2010 at 11:46 PM, Ahmet Arslan <io...@yahoo.com> wrote:

>
> --- On Sun, 11/14/10, Parsa Ghaffari <pa...@gmail.com> wrote:
>
> > From: Parsa Ghaffari <pa...@gmail.com>
> > Subject: Re: Solr TermsComponent: space in term
> > To: solr-user@lucene.apache.org
> > Date: Sunday, November 14, 2010, 5:06 PM
> > Hi Ahmet,
> >
> > This is the fieldType for "name":
> >
> >     <fieldType name="textgen"
> > class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer type="index">
> >         <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >         <filter
> > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"
> > />
> >         <filter
> > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1"
> > catenateWords="1"
> > catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="0"/>
> >         <filter
> > class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >         <filter
> > class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> >         <filter
> > class="solr.StopFilterFactory"
> >
> > ignoreCase="true"
> >
> > words="stopwords.txt"
> >
> > enablePositionIncrements="true"
> >
> > />
> >         <filter
> > class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1"
> > catenateWords="0"
> > catenateNumbers="0" catenateAll="0"
> > splitOnCaseChange="0"/>
> >         <filter
> > class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >     </fieldType>
> >
> > and:
> >
> > <field name="name" type="textgen" indexed="true"
> > stored="true"/>
> >
> > there's no ShingleFilterFactory. And also after changing
> > parameters in the
> > schema, should one re-index the table?
>
> Yes yes, re-index and restart servlet container is required. What kind of
> values does name field take? Does it contains punctuations? Can you give
> some examples of that field's values?
>
>
>
>


-- 
Parsa B. Ghaffari

Re: Solr TermsComponent: space in term

Posted by Ahmet Arslan <io...@yahoo.com>.
--- On Sun, 11/14/10, Parsa Ghaffari <pa...@gmail.com> wrote:

> From: Parsa Ghaffari <pa...@gmail.com>
> Subject: Re: Solr TermsComponent: space in term
> To: solr-user@lucene.apache.org
> Date: Sunday, November 14, 2010, 5:06 PM
> Hi Ahmet,
> 
> This is the fieldType for "name":
> 
>     <fieldType name="textgen"
> class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"
> />
>         <filter
> class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="0"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter
> class="solr.StopFilterFactory"
>                
> ignoreCase="true"
>                
> words="stopwords.txt"
>                
> enablePositionIncrements="true"
>                
> />
>         <filter
> class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="0"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> 
> and:
> 
> <field name="name" type="textgen" indexed="true"
> stored="true"/>
> 
> there's no ShingleFilterFactory. And also after changing
> parameters in the
> schema, should one re-index the table?

Yes yes, re-index and restart servlet container is required. What kind of values does name field take? Does it contains punctuations? Can you give some examples of that field's values?


      

Re: Solr TermsComponent: space in term

Posted by Parsa Ghaffari <pa...@gmail.com>.
Hi Ahmet,

This is the fieldType for "name":

    <fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

and:

<field name="name" type="textgen" indexed="true" stored="true"/>

there's no ShingleFilterFactory. And also after changing parameters in the
schema, should one re-index the table?


On Sun, Nov 14, 2010 at 10:32 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> > I'm using Solr 1.4.1 and I'm willing to use TermsComponent
> > for AutoComplete.
> > The problem is, I can't get it to match strings with spaces
> > in them. So to
> > say,
> >
> >
> terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json
> >
> > matches all strings starting with "david" but if I change
> > it to:
> >
> >
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
> >
> > it doesn't match all strings starting with "david ". Is it
> > meant to be that
> > way?
>
> This is about fielyType of name? What is it? If it does have
> ShingleFilterFactory in it, then this is expected.
>
>
>
>


-- 
Parsa B. Ghaffari

Re: Solr TermsComponent: space in term

Posted by Ahmet Arslan <io...@yahoo.com>.
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
> 
> it doesn't match all strings starting with "david ". Is it
> meant to be that
> way? 

This is about fielyType of name field. What is it? If it does have ShingleFilterFactory in it, then this is expected.


      

Re: Solr TermsComponent: space in term

Posted by Ahmet Arslan <io...@yahoo.com>.
> I'm using Solr 1.4.1 and I'm willing to use TermsComponent
> for AutoComplete.
> The problem is, I can't get it to match strings with spaces
> in them. So to
> say,
> 
> terms.fl=name&terms.lower=david&terms.prefix=david&terms.lower.incl=false&indent=true&wt=json
> 
> matches all strings starting with "david" but if I change
> it to:
> 
> terms.fl=name&terms.lower=david%20&terms.prefix=david%20&terms.lower.incl=false&indent=true&wt=json
> 
> it doesn't match all strings starting with "david ". Is it
> meant to be that
> way? 

This is about fielyType of name? What is it? If it does have ShingleFilterFactory in it, then this is expected.