You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Brian Lamb <br...@journalexperts.com> on 2011/03/30 19:21:37 UTC

Matching the beginning of a word within a term

Hi all,

I have a field set up like this:

<field name="common_names" multiValued="true" type="text" indexed="true"
stored="true" required="false" />

And I have some records:

RECORD1
<arr name="common_names">
<str>companion to mankind</str>
<str>pooch</str>
</arr>

RECORD2
<arr name="common_names">
<str>companion to womankind</str>
<str>man's worst enemy</str>
</arr>

I would like to write a query that will match the beginning of a word within
the term. Here is the query I would use as it exists now:

http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}"companion
man"~10

In the above example. I would want to return only RECORD1.

The query as it exists right now is designed to only match records where
both words are present in the same term. So if I changed man to mankind in
the query, RECORD1 will be returned.

Even though the phrases companion and man exist in the same term in RECORD2,
I do not want RECORD2 to be returned because 'man' is not at the beginning
of the word.

How can I achieve this?

Thanks,

Brian Lamb

Re: Matching the beginning of a word within a term

Posted by Brian Lamb <br...@journalexperts.com>.

Thank you both for your replies. It looks like EdgeNGramFilter will do the
job nicely. Time to reindex...again.

On Fri, Apr 1, 2011 at 8:31 AM, Jan Høydahl <ja...@cominvent.com> wrote:

> Check out
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
> Don't know if it works with phrases though
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 31. mars 2011, at 16.49, Brian Lamb wrote:
>
> > No, I don't really want to break down the words into subwords. In the
> > example I provided, I would not want "kind" to match either record
> because
> > it is not at the beginning of the word even though "kind" appears in both
> > records as part of a word.
> >
> > On Wed, Mar 30, 2011 at 4:42 PM, lboutros <bo...@gmail.com> wrote:
> >
> >> Do you want to tokenize subwords based on dictionaries ? A bit like
> >> disagglutination of german words ?
> >>
> >> If so, something like this could help :
> DictionaryCompoundWordTokenFilter
> >>
> >> http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8
> >>
> >> Ludovic
> >>
> >>
> >>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html
> >>
> >> 2011/3/30 Brian Lamb [via Lucene] <
> >> ml-node+2754668-300063934-383657@n3.nabble.com>
> >>
> >>> Hi all,
> >>>
> >>> I have a field set up like this:
> >>>
> >>> <field name="common_names" multiValued="true" type="text"
> indexed="true"
> >>> stored="true" required="false" />
> >>>
> >>> And I have some records:
> >>>
> >>> RECORD1
> >>> <arr name="common_names">
> >>> <str>companion to mankind</str>
> >>> <str>pooch</str>
> >>> </arr>
> >>>
> >>> RECORD2
> >>> <arr name="common_names">
> >>> <str>companion to womankind</str>
> >>> <str>man's worst enemy</str>
> >>> </arr>
> >>>
> >>> I would like to write a query that will match the beginning of a word
> >>> within
> >>> the term. Here is the query I would use as it exists now:
> >>>
> >>>
> >>
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}
> >> "companion
> >>>
> >>> man"~10
> >>>
> >>> In the above example. I would want to return only RECORD1.
> >>>
> >>> The query as it exists right now is designed to only match records
> where
> >>> both words are present in the same term. So if I changed man to mankind
> >> in
> >>> the query, RECORD1 will be returned.
> >>>
> >>> Even though the phrases companion and man exist in the same term in
> >>> RECORD2,
> >>> I do not want RECORD2 to be returned because 'man' is not at the
> >> beginning
> >>> of the word.
> >>>
> >>> How can I achieve this?
> >>>
> >>> Thanks,
> >>>
> >>> Brian Lamb
> >>>
> >>>
> >>> ------------------------------
> >>> If you reply to this email, your message will be added to the
> discussion
> >>> below:
> >>>
> >>>
> >>
> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html
> >>> To start a new topic under Solr - User, email
> >>> ml-node+472068-1765922688-383657@n3.nabble.com
> >>> To unsubscribe from Solr - User, click here<
> >>
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=
> >>> .
> >>>
> >>>
> >>
> >>
> >> -----
> >> Jouve
> >> France.
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Matching the beginning of a word within a term

Posted by Jan Høydahl <ja...@cominvent.com>.

Check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
Don't know if it works with phrases though

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 31. mars 2011, at 16.49, Brian Lamb wrote:

> No, I don't really want to break down the words into subwords. In the
> example I provided, I would not want "kind" to match either record because
> it is not at the beginning of the word even though "kind" appears in both
> records as part of a word.
> 
> On Wed, Mar 30, 2011 at 4:42 PM, lboutros <bo...@gmail.com> wrote:
> 
>> Do you want to tokenize subwords based on dictionaries ? A bit like
>> disagglutination of german words ?
>> 
>> If so, something like this could help : DictionaryCompoundWordTokenFilter
>> 
>> http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8
>> 
>> Ludovic
>> 
>> 
>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html
>> 
>> 2011/3/30 Brian Lamb [via Lucene] <
>> ml-node+2754668-300063934-383657@n3.nabble.com>
>> 
>>> Hi all,
>>> 
>>> I have a field set up like this:
>>> 
>>> <field name="common_names" multiValued="true" type="text" indexed="true"
>>> stored="true" required="false" />
>>> 
>>> And I have some records:
>>> 
>>> RECORD1
>>> <arr name="common_names">
>>> <str>companion to mankind</str>
>>> <str>pooch</str>
>>> </arr>
>>> 
>>> RECORD2
>>> <arr name="common_names">
>>> <str>companion to womankind</str>
>>> <str>man's worst enemy</str>
>>> </arr>
>>> 
>>> I would like to write a query that will match the beginning of a word
>>> within
>>> the term. Here is the query I would use as it exists now:
>>> 
>>> 
>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}
>> "companion
>>> 
>>> man"~10
>>> 
>>> In the above example. I would want to return only RECORD1.
>>> 
>>> The query as it exists right now is designed to only match records where
>>> both words are present in the same term. So if I changed man to mankind
>> in
>>> the query, RECORD1 will be returned.
>>> 
>>> Even though the phrases companion and man exist in the same term in
>>> RECORD2,
>>> I do not want RECORD2 to be returned because 'man' is not at the
>> beginning
>>> of the word.
>>> 
>>> How can I achieve this?
>>> 
>>> Thanks,
>>> 
>>> Brian Lamb
>>> 
>>> 
>>> ------------------------------
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> 
>>> 
>> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html
>>> To start a new topic under Solr - User, email
>>> ml-node+472068-1765922688-383657@n3.nabble.com
>>> To unsubscribe from Solr - User, click here<
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=
>>> .
>>> 
>>> 
>> 
>> 
>> -----
>> Jouve
>> France.
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Matching the beginning of a word within a term

Posted by lboutros <bo...@gmail.com>.

So if i understand well, in these exemples :

http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}"companion
mank"~10 

http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}"companion
manki"~10 

http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}"companion
mankin"~10 

You want to retrieve the same record (1) ? So you would like something like
:

http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}"companion
man*"~10 

Did you took a look to the ComplexPhraseQueryParser ?

http://lucene.apache.org/java/3_1_0/api/all/org/apache/lucene/queryParser/complexPhrase/ComplexPhraseQueryParser.html

Ludovic


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2760486.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Matching the beginning of a word within a term

Posted by Brian Lamb <br...@journalexperts.com>.

No, I don't really want to break down the words into subwords. In the
example I provided, I would not want "kind" to match either record because
it is not at the beginning of the word even though "kind" appears in both
records as part of a word.

On Wed, Mar 30, 2011 at 4:42 PM, lboutros <bo...@gmail.com> wrote:

> Do you want to tokenize subwords based on dictionaries ? A bit like
> disagglutination of german words ?
>
> If so, something like this could help : DictionaryCompoundWordTokenFilter
>
> http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8
>
> Ludovic
>
>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html
>
> 2011/3/30 Brian Lamb [via Lucene] <
> ml-node+2754668-300063934-383657@n3.nabble.com>
>
> > Hi all,
> >
> > I have a field set up like this:
> >
> > <field name="common_names" multiValued="true" type="text" indexed="true"
> > stored="true" required="false" />
> >
> > And I have some records:
> >
> > RECORD1
> > <arr name="common_names">
> > <str>companion to mankind</str>
> > <str>pooch</str>
> > </arr>
> >
> > RECORD2
> > <arr name="common_names">
> > <str>companion to womankind</str>
> > <str>man's worst enemy</str>
> > </arr>
> >
> > I would like to write a query that will match the beginning of a word
> > within
> > the term. Here is the query I would use as it exists now:
> >
> >
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}
> "companion
> >
> > man"~10
> >
> > In the above example. I would want to return only RECORD1.
> >
> > The query as it exists right now is designed to only match records where
> > both words are present in the same term. So if I changed man to mankind
> in
> > the query, RECORD1 will be returned.
> >
> > Even though the phrases companion and man exist in the same term in
> > RECORD2,
> > I do not want RECORD2 to be returned because 'man' is not at the
> beginning
> > of the word.
> >
> > How can I achieve this?
> >
> > Thanks,
> >
> > Brian Lamb
> >
> >
> > ------------------------------
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html
> >  To start a new topic under Solr - User, email
> > ml-node+472068-1765922688-383657@n3.nabble.com
> > To unsubscribe from Solr - User, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=
> >.
> >
> >
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Matching the beginning of a word within a term

Posted by lboutros <bo...@gmail.com>.

Do you want to tokenize subwords based on dictionaries ? A bit like
disagglutination of german words ?

If so, something like this could help : DictionaryCompoundWordTokenFilter

http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8

Ludovic

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html

2011/3/30 Brian Lamb [via Lucene] <
ml-node+2754668-300063934-383657@n3.nabble.com>

> Hi all,
>
> I have a field set up like this:
>
> <field name="common_names" multiValued="true" type="text" indexed="true"
> stored="true" required="false" />
>
> And I have some records:
>
> RECORD1
> <arr name="common_names">
> <str>companion to mankind</str>
> <str>pooch</str>
> </arr>
>
> RECORD2
> <arr name="common_names">
> <str>companion to womankind</str>
> <str>man's worst enemy</str>
> </arr>
>
> I would like to write a query that will match the beginning of a word
> within
> the term. Here is the query I would use as it exists now:
>
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names}"companion
>
> man"~10
>
> In the above example. I would want to return only RECORD1.
>
> The query as it exists right now is designed to only match records where
> both words are present in the same term. So if I changed man to mankind in
> the query, RECORD1 will be returned.
>
> Even though the phrases companion and man exist in the same term in
> RECORD2,
> I do not want RECORD2 to be returned because 'man' is not at the beginning
> of the word.
>
> How can I achieve this?
>
> Thanks,
>
> Brian Lamb
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383657@n3.nabble.com
> To unsubscribe from Solr - User, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>.
>
>


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html
Sent from the Solr - User mailing list archive at Nabble.com.