You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Spam <ps...@mac.com> on 2010/07/08 02:33:09 UTC
Using hl.regex.pattern to print complete lines
Hi,
I have a text file broken apart by carriage returns, and I'd like to only return entire lines. So, I'm trying to use this:
&hl.fragmenter=regex
&hl.regex.pattern=^.*$
... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so. I also tried a pattern of "\n.*\n" which seems to work better, but still isn't right. Any ideas?
-Pete
Re: Using hl.regex.pattern to print complete lines
Posted by Lance Norskog <go...@gmail.com>.
Java regex might be different from all other regex, so writing a test
program and experimenting is the only way. Once you decide that this
expression really is what you want, and that it does not achieve what
you expect, you might have found a bug in highlighting.
Lucene/Solr highlighting has always been a difficult area, and might
not do everything right.
On Wed, Jul 21, 2010 at 4:20 PM, Peter Spam <ps...@mac.com> wrote:
> Still not working ... any ideas?
>
>
> -Pete
>
> On Jul 14, 2010, at 11:56 AM, Peter Spam wrote:
>
>> Any other thoughts, Chris? I've been messing with this a bit, and can't seem to get (?m)^.*$ to do what I want.
>>
>> 1) I don't care how many characters it returns, I'd like entire lines all the time
>> 2) I just want it to always return 3 lines: the line before, the actual line, and the line after.
>> 3) This should be like "grep -C1"
>>
>> Thanks for your time!
>>
>>
>> -Pete
>>
>> On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:
>>
>>> Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works better, but I still get fragments before and after some returns.
>>> Thanks for the hint!
>>>
>>>
>>> -Pete
>>>
>>> On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
>>>> : is available that is for getting entire field contents with search terms
>>>> : highlighted. To use it, set hl.useFastVectorHighlighter to true.
>>>>
>>>> He doesn't want the entire field -- his stored field values contain
>>>> multi-line strings (using newline characters) and he wants to make
>>>> fragments per "line" (ie: bounded by newline characters, or the start/end
>>>> of the entire field value)
>>>>
>>>> Peter: i haven't looked at the code, but i expect that the problem is that
>>>> the java regex engine isn't being used in a way that makes ^ and $ match
>>>> any line boundary -- they are probably only matching the start/end of the
>>>> field (and . is probably only matching non-newline characters)
>>>>
>>>> java regexes support embedded flags (ie: "(?xyz)your regex") so you might
>>>> try that (i don't remember what the correct modifier flag is for the
>>>> multiline mode off the top of my head)
>>>>
>>>> -Hoss
>>>>
>>>
>>
>
>
--
Lance Norskog
goksron@gmail.com
Re: Using hl.regex.pattern to print complete lines
Posted by Peter Spam <ps...@mac.com>.
Still not working ... any ideas?
-Pete
On Jul 14, 2010, at 11:56 AM, Peter Spam wrote:
> Any other thoughts, Chris? I've been messing with this a bit, and can't seem to get (?m)^.*$ to do what I want.
>
> 1) I don't care how many characters it returns, I'd like entire lines all the time
> 2) I just want it to always return 3 lines: the line before, the actual line, and the line after.
> 3) This should be like "grep -C1"
>
> Thanks for your time!
>
>
> -Pete
>
> On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:
>
>> Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works better, but I still get fragments before and after some returns.
>> Thanks for the hint!
>>
>>
>> -Pete
>>
>> On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
>>
>>>
>>> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
>>> : is available that is for getting entire field contents with search terms
>>> : highlighted. To use it, set hl.useFastVectorHighlighter to true.
>>>
>>> He doesn't want the entire field -- his stored field values contain
>>> multi-line strings (using newline characters) and he wants to make
>>> fragments per "line" (ie: bounded by newline characters, or the start/end
>>> of the entire field value)
>>>
>>> Peter: i haven't looked at the code, but i expect that the problem is that
>>> the java regex engine isn't being used in a way that makes ^ and $ match
>>> any line boundary -- they are probably only matching the start/end of the
>>> field (and . is probably only matching non-newline characters)
>>>
>>> java regexes support embedded flags (ie: "(?xyz)your regex") so you might
>>> try that (i don't remember what the correct modifier flag is for the
>>> multiline mode off the top of my head)
>>>
>>> -Hoss
>>>
>>
>
Re: Using hl.regex.pattern to print complete lines
Posted by Peter Spam <ps...@mac.com>.
Any other thoughts, Chris? I've been messing with this a bit, and can't seem to get (?m)^.*$ to do what I want.
1) I don't care how many characters it returns, I'd like entire lines all the time
2) I just want it to always return 3 lines: the line before, the actual line, and the line after.
3) This should be like "grep -C1"
Thanks for your time!
-Pete
On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:
> Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works better, but I still get fragments before and after some returns.
> Thanks for the hint!
>
>
> -Pete
>
> On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
>
>>
>> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
>> : is available that is for getting entire field contents with search terms
>> : highlighted. To use it, set hl.useFastVectorHighlighter to true.
>>
>> He doesn't want the entire field -- his stored field values contain
>> multi-line strings (using newline characters) and he wants to make
>> fragments per "line" (ie: bounded by newline characters, or the start/end
>> of the entire field value)
>>
>> Peter: i haven't looked at the code, but i expect that the problem is that
>> the java regex engine isn't being used in a way that makes ^ and $ match
>> any line boundary -- they are probably only matching the start/end of the
>> field (and . is probably only matching non-newline characters)
>>
>> java regexes support embedded flags (ie: "(?xyz)your regex") so you might
>> try that (i don't remember what the correct modifier flag is for the
>> multiline mode off the top of my head)
>>
>> -Hoss
>>
>
Re: Using hl.regex.pattern to print complete lines
Posted by Peter Spam <ps...@mac.com>.
Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works better, but I still get fragments before and after some returns.
Thanks for the hint!
-Pete
On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
>
> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
> : is available that is for getting entire field contents with search terms
> : highlighted. To use it, set hl.useFastVectorHighlighter to true.
>
> He doesn't want the entire field -- his stored field values contain
> multi-line strings (using newline characters) and he wants to make
> fragments per "line" (ie: bounded by newline characters, or the start/end
> of the entire field value)
>
> Peter: i haven't looked at the code, but i expect that the problem is that
> the java regex engine isn't being used in a way that makes ^ and $ match
> any line boundary -- they are probably only matching the start/end of the
> field (and . is probably only matching non-newline characters)
>
> java regexes support embedded flags (ie: "(?xyz)your regex") so you might
> try that (i don't remember what the correct modifier flag is for the
> multiline mode off the top of my head)
>
> -Hoss
>
Re: Using hl.regex.pattern to print complete lines
Posted by Chris Hostetter <ho...@fucit.org>.
: If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
: is available that is for getting entire field contents with search terms
: highlighted. To use it, set hl.useFastVectorHighlighter to true.
He doesn't want the entire field -- his stored field values contain
multi-line strings (using newline characters) and he wants to make
fragments per "line" (ie: bounded by newline characters, or the start/end
of the entire field value)
Peter: i haven't looked at the code, but i expect that the problem is that
the java regex engine isn't being used in a way that makes ^ and $ match
any line boundary -- they are probably only matching the start/end of the
field (and . is probably only matching non-newline characters)
java regexes support embedded flags (ie: "(?xyz)your regex") so you might
try that (i don't remember what the correct modifier flag is for the
multiline mode off the top of my head)
-Hoss
Re: Using hl.regex.pattern to print complete lines
Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(10/07/09 9:30), Peter Spam wrote:
> Thanks for the note, Koji. However, hl.fragsize=0 seems to return the entire document, rather than just one single line.
>
> Here's what I tried (what I previously had was commented out):
>
> regexv = "^.*$"
> thequery = '/solr/select?facet=true&facet.limit=10&fl=id,score,filename&tv=true&timeAllowed=3000&facet.field=filename&qt=tvrh&wt=ruby' + (p['fq'].empty? ? '' : ('&fq='+p['fq'].to_s) ) + '&q=' + CGI::escape(p['q'].to_s) + '&rows=' + p['rows'].to_s + "&hl=true&hl.snippets=1&hl.fragsize=0" #&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" + CGI::escape(regexv)
>
> Thanks for your help.
>
>
> -Peter
>
>
Peter,
Are you sure using GapFragmenter when you set fragsize to 0?
I've never tried regex fragmenter...
If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
is available that is for getting entire field contents with search terms
highlighted. To use it, set hl.useFastVectorHighlighter to true.
Koji
--
http://www.rondhuit.com/en/
Re: Using hl.regex.pattern to print complete lines
Posted by Peter Spam <ps...@mac.com>.
Thanks for the note, Koji. However, hl.fragsize=0 seems to return the entire document, rather than just one single line.
Here's what I tried (what I previously had was commented out):
regexv = "^.*$"
thequery = '/solr/select?facet=true&facet.limit=10&fl=id,score,filename&tv=true&timeAllowed=3000&facet.field=filename&qt=tvrh&wt=ruby' + (p['fq'].empty? ? '' : ('&fq='+p['fq'].to_s) ) + '&q=' + CGI::escape(p['q'].to_s) + '&rows=' + p['rows'].to_s + "&hl=true&hl.snippets=1&hl.fragsize=0" #&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" + CGI::escape(regexv)
Thanks for your help.
-Peter
On Jul 8, 2010, at 3:47 PM, Koji Sekiguchi wrote:
> (10/07/09 2:44), Peter Spam wrote:
>> To clarify, I never want a snippet, I always want a whole line returned. Is this possible? Thanks!
>>
>>
>> -Pete
>>
>>
> Hello Pete,
>
> Use NullFragmenter. It can be used via GapFragmenter with
> hl.fragsize=0.
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
Re: Using hl.regex.pattern to print complete lines
Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(10/07/09 2:44), Peter Spam wrote:
> To clarify, I never want a snippet, I always want a whole line returned. Is this possible? Thanks!
>
>
> -Pete
>
>
Hello Pete,
Use NullFragmenter. It can be used via GapFragmenter with
hl.fragsize=0.
Koji
--
http://www.rondhuit.com/en/
Re: Using hl.regex.pattern to print complete lines
Posted by Peter Spam <ps...@mac.com>.
To clarify, I never want a snippet, I always want a whole line returned. Is this possible? Thanks!
-Pete
On Jul 7, 2010, at 5:33 PM, Peter Spam wrote:
> Hi,
>
> I have a text file broken apart by carriage returns, and I'd like to only return entire lines. So, I'm trying to use this:
>
> &hl.fragmenter=regex
> &hl.regex.pattern=^.*$
>
> ... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so. I also tried a pattern of "\n.*\n" which seems to work better, but still isn't right. Any ideas?
>
>
> -Pete