You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Weiwei Wang <ww...@gmail.com> on 2009/12/14 06:01:15 UTC

Offset Problem

The offset is incorrect for PatternReplaceCharFilter so the hilighting
result is wrong.

How to fix it?

On Mon, Dec 14, 2009 at 11:43 AM, Weiwei Wang <ww...@gmail.com> wrote:

> All solr souce downloaded, and I found PatternReplaceCharFilter is very
> useful for my project.
>
> Thanks
>
>
> On Mon, Dec 14, 2009 at 11:14 AM, Weiwei Wang <ww...@gmail.com>wrote:
>
>> I need the source file not the patch file, where can i download it?
>>
>>
>> On Mon, Dec 14, 2009 at 1:15 AM, Koji Sekiguchi <ko...@r.email.ne.jp>wrote:
>>
>>> Koji Sekiguchi wrote:
>>>
>>>> Paul Taylor wrote:
>>>>
>>>>> I want my search to treat 'No. 1' and 'No.1' the same, because in our
>>>>> context its one token I want 'No. 1' to become 'No.1',  I need to do this
>>>>> before tokenizing because the tokenizer would split one value into two terms
>>>>> and one into just one term. I already use a NormalizeMapFilter to map &' to
>>>>> 'and' but I think it only takes literal text and I need to
>>>>>
>>>>> 1. be case insensitive (but lowercasefilter is only applied after
>>>>> tokenizing)
>>>>>
>>>>> 2. cope with all numbers e.g no. 109
>>>>>
>>>>> So I was going to subclass BaseCharFilter and do my matches with a
>>>>> regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling to
>>>>> understand the offset methods you have to do once you get a match. Has
>>>>> anyone already got a regular expression Charfilter OR am I approaching this
>>>>> all wrong
>>>>>
>>>>> thanks Paul
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>  Hi Paul,
>>>>
>>>> I've written a patch for this kind of purpose. See:
>>>>
>>>> https://issues.apache.org/jira/browse/SOLR-1653
>>>>
>>>> Koji
>>>>
>>>>  Oops. I thought this is solr-user list, but it was java-user. :-D
>>>
>>>
>>> Koji
>>>
>>> --
>>> http://www.rondhuit.com/en/
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> Weiwei Wang
>> Alex Wang
>> 王巍巍
>> Room 403, Mengmin Wei Building
>> Computer Science Department
>> Gulou Campus of Nanjing University
>> Nanjing, P.R.China, 210093
>>
>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>>
>
>
>
> --
> Weiwei Wang
> Alex Wang
> 王巍巍
> Room 403, Mengmin Wei Building
> Computer Science Department
> Gulou Campus of Nanjing University
> Nanjing, P.R.China, 210093
>
> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>



-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Re: Offset Problem

Posted by Weiwei Wang <ww...@gmail.com>.
got it, thanks, Koji

On Mon, Dec 14, 2009 at 9:19 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Weiwei Wang wrote:
>
>> The offset is incorrect for PatternReplaceCharFilter so the hilighting
>> result is wrong.
>>
>> How to fix it?
>>
>>
>>
> As I noted in the comment of the source, if you produce a phrase from a
> term
> and try to highlight a term in the produced phrase, the highlighted snippet
> will be undesirable. This is the feature, unfortunately. But if you try to
> highlight whole the produced phrase, the snippet will be desirable.
>
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Re: Offset Problem

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Weiwei Wang wrote:
> The offset is incorrect for PatternReplaceCharFilter so the hilighting
> result is wrong.
>
> How to fix it?
>
>   
As I noted in the comment of the source, if you produce a phrase from a term
and try to highlight a term in the produced phrase, the highlighted snippet
will be undesirable. This is the feature, unfortunately. But if you try to
highlight whole the produced phrase, the snippet will be desirable.

Koji

-- 
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org