You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Bradley <da...@adfero.co.uk> on 2011/10/25 18:27:16 UTC
Search for the single hash "#" character never returns results
When running a search such as:
field_name:#
field_name:"#"
field_name:"\#"
where there is a record with the value of exactly "#", solr returns 0 rows.
The workaround we are having to use is to use a range query on the
field such as:
field_name:[# TO #]
and this returns the correct documents.
Use case details:
We have a field that indexes a text field and calculates a "letter
group". This keeps only the first significant character from a value
(number or letter), and if it is a number the simply stores "#" as we
want all numbered items grouped together.
I'm also aware that we could also fix this by using a specific number
instead of the hash character, however, I though I'd raise this to see
if there is a wider issue. I've listed some specific details below.
Thanks for your time,
Daniel Bradley
Field definition:
<fieldType name="letterGrouping" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory"
pattern="^([a-zA-Z0-9]).*" group="1"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z0-9])" replacement="" replace="all"
/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="([0-9])" replacement="#" replace="all"
/>
</analyzer>
</fieldType>
Server information:
Solr Specification Version: 3.2.0
Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
Lucene Specification Version: 3.2.0
Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
Re: Search for the single hash "#" character never returns results
Posted by Erick Erickson <er...@gmail.com>.
NP. By the way, kudos for posting enough information to diagnose
the problem first time round!
Erick
On Thu, Oct 27, 2011 at 8:46 AM, Daniel Bradley
<da...@adfero.co.uk> wrote:
> Fantastic, thanks, yes I completely overlooked that case, separating the
> analysers worked a treat.
>
> Had also posted on stack overflow but the mailing list proved to be
> superior!
>
> Many thanks,
>
> Daniel
>
> On 27 October 2011 13:09, Erick Erickson <er...@gmail.com> wrote:
>
>> Take a look at your admin/analysis page and put your tokens in for both
>> index and query times. What I think you'll see is that the # is being
>> stripped at query time due to the first PatternReplaceFilterFactory.
>>
>> You probably want to split your analyzers into an index-time and query-time
>> pair and do the appropriate replacements to keep # at quer time.
>>
>>
>> Best
>> Erick
>>
>> On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley
>> <da...@adfero.co.uk> wrote:
>> > When running a search such as:
>> > field_name:#
>> > field_name:"#"
>> > field_name:"\#"
>> >
>> > where there is a record with the value of exactly "#", solr returns 0
>> rows.
>> >
>> > The workaround we are having to use is to use a range query on the
>> > field such as:
>> > field_name:[# TO #]
>> > and this returns the correct documents.
>> >
>> > Use case details:
>> > We have a field that indexes a text field and calculates a "letter
>> > group". This keeps only the first significant character from a value
>> > (number or letter), and if it is a number the simply stores "#" as we
>> > want all numbered items grouped together.
>> >
>> > I'm also aware that we could also fix this by using a specific number
>> > instead of the hash character, however, I though I'd raise this to see
>> > if there is a wider issue. I've listed some specific details below.
>> >
>> > Thanks for your time,
>> >
>> > Daniel Bradley
>> >
>> >
>> > Field definition:
>> > <fieldType name="letterGrouping" class="solr.TextField"
>> > sortMissingLast="true" omitNorms="true">
>> > <analyzer>
>> > <tokenizer class="solr.PatternTokenizerFactory"
>> > pattern="^([a-zA-Z0-9]).*" group="1"/>
>> > <filter class="solr.LowerCaseFilterFactory" />
>> > <filter class="solr.TrimFilterFactory" />
>> > <filter class="solr.PatternReplaceFilterFactory"
>> > pattern="([^a-z0-9])" replacement="" replace="all"
>> > />
>> > <filter class="solr.PatternReplaceFilterFactory"
>> > pattern="([0-9])" replacement="#" replace="all"
>> > />
>> > </analyzer>
>> > </fieldType>
>> >
>> > Server information:
>> > Solr Specification Version: 3.2.0
>> > Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
>> > Lucene Specification Version: 3.2.0
>> > Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
>> >
>>
>
Re: Search for the single hash "#" character never returns results
Posted by Daniel Bradley <da...@adfero.co.uk>.
Fantastic, thanks, yes I completely overlooked that case, separating the
analysers worked a treat.
Had also posted on stack overflow but the mailing list proved to be
superior!
Many thanks,
Daniel
On 27 October 2011 13:09, Erick Erickson <er...@gmail.com> wrote:
> Take a look at your admin/analysis page and put your tokens in for both
> index and query times. What I think you'll see is that the # is being
> stripped at query time due to the first PatternReplaceFilterFactory.
>
> You probably want to split your analyzers into an index-time and query-time
> pair and do the appropriate replacements to keep # at quer time.
>
>
> Best
> Erick
>
> On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley
> <da...@adfero.co.uk> wrote:
> > When running a search such as:
> > field_name:#
> > field_name:"#"
> > field_name:"\#"
> >
> > where there is a record with the value of exactly "#", solr returns 0
> rows.
> >
> > The workaround we are having to use is to use a range query on the
> > field such as:
> > field_name:[# TO #]
> > and this returns the correct documents.
> >
> > Use case details:
> > We have a field that indexes a text field and calculates a "letter
> > group". This keeps only the first significant character from a value
> > (number or letter), and if it is a number the simply stores "#" as we
> > want all numbered items grouped together.
> >
> > I'm also aware that we could also fix this by using a specific number
> > instead of the hash character, however, I though I'd raise this to see
> > if there is a wider issue. I've listed some specific details below.
> >
> > Thanks for your time,
> >
> > Daniel Bradley
> >
> >
> > Field definition:
> > <fieldType name="letterGrouping" class="solr.TextField"
> > sortMissingLast="true" omitNorms="true">
> > <analyzer>
> > <tokenizer class="solr.PatternTokenizerFactory"
> > pattern="^([a-zA-Z0-9]).*" group="1"/>
> > <filter class="solr.LowerCaseFilterFactory" />
> > <filter class="solr.TrimFilterFactory" />
> > <filter class="solr.PatternReplaceFilterFactory"
> > pattern="([^a-z0-9])" replacement="" replace="all"
> > />
> > <filter class="solr.PatternReplaceFilterFactory"
> > pattern="([0-9])" replacement="#" replace="all"
> > />
> > </analyzer>
> > </fieldType>
> >
> > Server information:
> > Solr Specification Version: 3.2.0
> > Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
> > Lucene Specification Version: 3.2.0
> > Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
> >
>
Re: Search for the single hash "#" character never returns results
Posted by Erick Erickson <er...@gmail.com>.
Take a look at your admin/analysis page and put your tokens in for both
index and query times. What I think you'll see is that the # is being
stripped at query time due to the first PatternReplaceFilterFactory.
You probably want to split your analyzers into an index-time and query-time
pair and do the appropriate replacements to keep # at quer time.
Best
Erick
On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley
<da...@adfero.co.uk> wrote:
> When running a search such as:
> field_name:#
> field_name:"#"
> field_name:"\#"
>
> where there is a record with the value of exactly "#", solr returns 0 rows.
>
> The workaround we are having to use is to use a range query on the
> field such as:
> field_name:[# TO #]
> and this returns the correct documents.
>
> Use case details:
> We have a field that indexes a text field and calculates a "letter
> group". This keeps only the first significant character from a value
> (number or letter), and if it is a number the simply stores "#" as we
> want all numbered items grouped together.
>
> I'm also aware that we could also fix this by using a specific number
> instead of the hash character, however, I though I'd raise this to see
> if there is a wider issue. I've listed some specific details below.
>
> Thanks for your time,
>
> Daniel Bradley
>
>
> Field definition:
> <fieldType name="letterGrouping" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
> <analyzer>
> <tokenizer class="solr.PatternTokenizerFactory"
> pattern="^([a-zA-Z0-9]).*" group="1"/>
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.TrimFilterFactory" />
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="([^a-z0-9])" replacement="" replace="all"
> />
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="([0-9])" replacement="#" replace="all"
> />
> </analyzer>
> </fieldType>
>
> Server information:
> Solr Specification Version: 3.2.0
> Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
> Lucene Specification Version: 3.2.0
> Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
>