You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by khirb7 <kh...@gmail.com> on 2008/04/07 12:22:37 UTC

Snipets Solr/nutch

hello every body

I am using solr in my project, and I want to use solr snipets generated by
the highlighting.
The problem is that these snipets aren't really well displayed, they are
trancated and not really meanigful.
I heard that nutch provide well snipets, is it possible and how  to
integrate them to my solr.

thank you in advence.  
-- 
View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16537216.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snipets Solr/nutch(maxFragSize?)

Posted by khirb7 <kh...@gmail.com>.


khirb7 wrote:
> 
> hello every body
>  
> just one other question, to analyse and modify Solr's snippet, I want to
> know if  org.apache.solr.util.HighlightingUtils
> is the class generating the snippet and which methode generate them, and
> could you please explain me how are they generated in that class and where
> exactly to modify it. all that in order to not return the first word
> encountered highlighted but to return an other one because of the problem
> I explained  in my previous messages
> 
> Cheers
> 
I have done deep search and I found that lucene provide this that methode  :
getBestFragments
highlighter.getBestFragments(tokenStream, text, maxNumFragment, "...");

so with this methode we can precise to lucene to return   maxNumFragment
fragment (with highligted word)of fragsize characters, but there is no
maxFragSize parameter in solr. this would be useful in my case if I want to
highlight not only the first occurrence of a searched word but up to 1
occurrence of the same word. 

cheers




-- 
View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16608806.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snipets Solr/nutch

Posted by Mike Klaas <mi...@gmail.com>.
On 15-Apr-08, at 1:37 PM, khirb7 wrote:
>
> Thank you a lot you are helpful, concerning my solr I am using the  
> 1.2.0
> version i download it from the Apache download mirror
> http://www.apache.org/dyn/closer.cgi/lucene/solr/  , I haven't well
> understand you when you said :
>
> you're trying to apply a patch that has long since been
> applied to Solr.

Hi khirb,

You could try looking at "trunk" (the development version of Solr that  
hasn't yet been release).  It contains all the features you were  
trying to add manually to your version.

You can download a "nightly" build of Solr here:

http://people.apache.org/builds/lucene/solr/nightly/

regards,
-Mike

Re: Snipets Solr/nutch

Posted by khirb7 <kh...@gmail.com>.


Mike Klaas wrote:
> 
> On 13-Apr-08, at 3:25 AM, khirb7 wrote:
>>
>> it doesn't work solr still use the default value fragsize=100. also  
>> I am not
>> able to spécifie    regex  fragmenter due to this probleme of  
>> version I
>> suppose or the way I am declaring  <highlighting> ......</ 
>> highlighting>
>> because
>> both of:
> 
> Hi khirb,
> 
> It might be easier for people to help you if you keep things in one  
> thread.
> 
> I notice that you're trying to apply a patch that has long since been  
> applied to Solr (another thread).  What version of Solr are you  
> using?  How did you acquire it?
> 
> -Mike
> 
hi mike 

Thank you a lot you are helpful, concerning my solr I am using the 1.2.0
version i download it from the Apache download mirror  
http://www.apache.org/dyn/closer.cgi/lucene/solr/  , I haven't well
understand you when you said :

you're trying to apply a patch that has long since been  
applied to Solr.

thank you mike.


-- 
View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16708645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snipets Solr/nutch

Posted by Mike Klaas <mi...@gmail.com>.
On 13-Apr-08, at 3:25 AM, khirb7 wrote:
>
> it doesn't work solr still use the default value fragsize=100. also  
> I am not
> able to spécifie    regex  fragmenter due to this probleme of  
> version I
> suppose or the way I am declaring  <highlighting> ......</ 
> highlighting>
> because
> both of:

Hi khirb,

It might be easier for people to help you if you keep things in one  
thread.

I notice that you're trying to apply a patch that has long since been  
applied to Solr (another thread).  What version of Solr are you  
using?  How did you acquire it?

-Mike

Re: Snipets Solr/nutch

Posted by khirb7 <kh...@gmail.com>.
hello,
mike adviser me last time to use:

>This is done by the fragmenting stage of highlighting.  Solr (trunk)  
>ships with a fragmenter that looks for sentence-like snippets using  
>regular expressions: try hl.fragmenter=regex (see config in  
>solrconfig.xml).
the prolem is I wasn't  able either to  do that or spécifie  the fragsize 
from solrconfig.xml i think it is due to the version of solr I use and what
classe and package I spécifie   ie:
I put this in solrconfig.xml

<highlighting>
<!-- Configure the standard fragmenter -->
−
	<!--
 This could most likely be commented out in the "default" case 
-->
−
	<fragmenter name="gap" class="org.apache.solr.util.GapFragmenter"
default="true">
−
	<lst name="defaults">
<int name="hl.fragsize">400</int>
</lst>
</fragmenter>
−
	<!--
 A regular-expression-based fragmenter (f.i., for sentence extraction) 
-->
−
	<fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
−
	<lst name="defaults">
−
	<!--
 slightly smaller fragsizes work better because of slop 
-->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
−
	<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
−
	<lst name="defaults">
<str name="hl.simple.pre"></str>
<str name="hl.simple.post"></str>
</lst>
</formatter>
</highlighting>

so either using 

<fragmenter name="gap" class="org.apache.solr.util.GapFragmenter"
default="true">
org.apache.solr.util.GapFragmenter   specifique to  solr1.2

 or 
<fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
default="true">

it doesn't work solr still use the default value fragsize=100. also I am not
able to spécifie    regex  fragmenter due to this probleme of version I
suppose or the way I am declaring  <highlighting> ......</highlighting>
because 
both of:
 
<fragmenter name="gap" class="org.apache.solr.util.GapFragmenter"
default="true">
and
<fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
default="true">
still use fragsize=100 but i am using   <int name="hl.fragsize">400</int> as
shown above.

thank you.
-- 
View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16656960.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snipets Solr/nutch

Posted by Mike Klaas <mi...@gmail.com>.
On 10-Apr-08, at 12:26 AM, khirb7 wrote:
>
> hello every body
>
> just one other question, to analyse and modify Solr's snippet, I  
> want to
> know if  org.apache.solr.util.HighlightingUtils
> is the class generating the snippet and which methode generate them,  
> and
> could you please explain me how are they generated in that class and  
> where
> exactly to modify it. all that in order to not return the first word
> encountered highlighted but to return an other one because of the  
> problem I
> explained  in my previous messages

Unfortunately I have not familiar with nutch's snippet generation.

Solr's highlighting is located in  
org.apache.solr.util.HighlightingUtils in version 1.2, in the current  
(trunk) version, it is located in
org.apache.solr.highlight.* package.

Your use case is a little tricky.  The best way to deal with it in my  
opinion is to strip out the header before sending the data to Solr.   
This will improve your highlighting _and_ your search relevance.

-Mike

Re: Snipets Solr/nutch

Posted by khirb7 <kh...@gmail.com>.
hello every body
 
just one other question, to analyse and modify Solr's snippet, I want to
know if  org.apache.solr.util.HighlightingUtils
is the class generating the snippet and which methode generate them, and
could you please explain me how are they generated in that class and where
exactly to modify it. all that in order to not return the first word
encountered highlighted but to return an other one because of the problem I
explained  in my previous messages

Cheers
-- 
View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16603642.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snipets Solr/nutch

Posted by khirb7 <kh...@gmail.com>.
thank you for your response.

I have  another problem with snippets.here is the problem:
I transform the  HTML code into text then I index all this text generated
into one field called myText , many pages has common header with common
information (example : web site about the president bush) and the word bush
appear in this header, if I want  to highlighting the the field myText and I
am searching the word bush, I will have      the same sentence containing
bush highlighted ( which is the sentence of the comment header containing
bush word  )because I have put fargsize to 150    and  Solr return through
the whole  text the first word encountered (bush) highlighted. How can I
deal with that. I was told that nutchwax handle this problem is it true?if
true how can I integarte nutch classes into solr.

thank you in advance.
-- 
View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16585594.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Snipets Solr/nutch

Posted by Mike Klaas <mi...@gmail.com>.
On 7-Apr-08, at 7:12 AM, khirb7 wrote:
> khirb7 wrote:
>>
>> hello every body
>>
>> I am using solr in my project, and I want to use solr snipets  
>> generated by
>> the highlighting.
>> The problem is that these snipets aren't really well displayed,  
>> they are
>> trancated and not really meanigful.
>> I heard that nutch provide well snipets, is it possible and how  to
>> integrate them to my solr.
>>
>> thank you in advence.
>>
> hi every body
> I am digging in solr classes and I am looking for solution to the  
> generated
> snipets, first of all I want to know on which class and where this  
> snippets
> are generated .
> my snippets are like this:
> " project, and I want to use solr snipets generated by the  
> highlighting"
> ie:
> do you se starting whith project has no sens,I think the best way is  
> to to
> show the whole sentence like this:
> "I am using solr in my project, and I want to use solr snipets  
> generated by
> the highlighting".
> and not to trunc it, may be by paying attention to the punctuation  
> (the
> comma or the capital letter)

This is done by the fragmenting stage of highlighting.  Solr (trunk)  
ships with a fragmenter that looks for sentence-like snippets using  
regular expressions: try hl.fragmenter=regex (see config in  
solrconfig.xml).

regards,
-Mike

Re: Snipets Solr/nutch

Posted by khirb7 <kh...@gmail.com>.


khirb7 wrote:
> 
> hello every body
> 
> I am using solr in my project, and I want to use solr snipets generated by
> the highlighting.
> The problem is that these snipets aren't really well displayed, they are
> trancated and not really meanigful.
> I heard that nutch provide well snipets, is it possible and how  to
> integrate them to my solr.
> 
> thank you in advence.  
> 
hi every body 
I am digging in solr classes and I am looking for solution to the generated
snipets, first of all I want to know on which class and where this snippets
are generated .
my snippets are like this:
" project, and I want to use solr snipets generated by the highlighting"
ie:
do you se starting whith project has no sens,I think the best way is to to
show the whole sentence like this:
"I am using solr in my project, and I want to use solr snipets generated by
the highlighting".
and not to trunc it, may be by paying attention to the punctuation (the
comma or the capital letter)

thank you in advence.




-- 
View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16537460.html
Sent from the Solr - User mailing list archive at Nabble.com.