You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shyam Bhaskaran <Sh...@synopsys.com> on 2012/02/07 16:50:01 UTC

Display of highlighted search result should start with the beginning of the sentence that contains the search string.

Hi,

We are using Solr 4.0 along with FVH and there is an issue we are facing while highlighting.
For our requirement we want the highlighted search result should start with the beginning of the sentence and needed help to get this done.

As of now this is not happening and the highlighted output comes up first in most scenarios.

I have tried using the parameter boundaryScanner but still not getting the desired required result.

Below is the configuration we are using.

   <boundaryScanner name="simple" class="solr.highlight.SimpleBoundaryScanner" default="true">
     <lst name="defaults">
       <str name="hl.bs.maxScan">10</str>
       <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
     </lst>
   </boundaryScanner>

I need help in getting the display of highlighted search result and it should start with the beginning of the sentence that contains the search string.

-Shyam

Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

Posted by Koorosh Vakhshoori <kv...@gmail.com>.
Hi Koji,
  I am Shyam's coworker. After some looking into this issue, I believe the
problem of chopped word has to do with
org.apache.lucene.search.vectorhighlight.SimpleFragListBuilder class'
'margin' field. It is set to 6 by default. My understanding is having margin
value of greater than zero results in truncated word when the highlighted
term is too close to beginning of a document. I was able to reset the
'margin' field by creating my custom version of
org.apache.solr.highlight.SimpleFragListBuilder and passing zero for
'margin' when calling the Lucene's SimpleFragListBuilder constructor. My
testing shows the problem has been fixed. Do you concur?

  Now couple of questions. Not sure what the purpose of this field is, could
you give the use case for it? Also could it be exposed as a parameter in
Solr so it could be set to some other value?

Thanks,

Koorosh


--
View this message in context: http://lucene.472066.n3.nabble.com/Display-of-highlighted-search-result-should-start-with-the-beginning-of-the-sentence-that-contains-t-tp3722912p3820516.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
It seems a bug to me
Can you open a ticket? Thank you

Koji Sekiguchi from iPhone

On 2012/02/08, at 13:32, Shyam Bhaskaran <Sh...@synopsys.com> wrote:

> Hi Koji,
> 
> Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I see improvements, below is the highlighted value
> 
> "The synthesis tool only supports the resolution functions for <em>std_logic</em> and std_logic_vector."
> 
> 
> But in other cases I also see that some of the words break in between as shown below
> 
> Original text: " How Are Clock Gating Checks Inferred"
> 
> When searching for the term "clock" the highlighted text is displayed as show below
> 
> "w Are <em>Clock</em> Gating Checks Inferred"
> 
> As you can see only w is displayed from the word How.
> 
> This issue goes away when I use .bs.chars=".!? &#9;&#10;&#13;" but it creates issue of highlighting not from the beginning of the sentence.
> 
> Is there a way whereby I can have highlighting working in all cases.
> 
> 
> -Shyam
> 

RE: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

Posted by Shyam Bhaskaran <Sh...@synopsys.com>.
Hi Koji,

Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I see improvements, below is the highlighted value

"The synthesis tool only supports the resolution functions for <em>std_logic</em> and std_logic_vector."


But in other cases I also see that some of the words break in between as shown below

Original text: " How Are Clock Gating Checks Inferred"

When searching for the term "clock" the highlighted text is displayed as show below

"w Are <em>Clock</em> Gating Checks Inferred"

As you can see only w is displayed from the word How.

This issue goes away when I use .bs.chars=".!? &#9;&#10;&#13;" but it creates issue of highlighting not from the beginning of the sentence.

Is there a way whereby I can have highlighting working in all cases.


-Shyam


Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/02/08 1:54), Shyam Bhaskaran wrote:
> Hi Koji,
>
> I have tried using hl.bs.type=SENTENCE and still no improvement.
>
> We are storing PDF extracted content in the field which has termVectors enabled.
>
> Example the field contains the following data extracted from PDF
>
> "User-defined resolution functions. The synthesis tool only supports the
> resolution functions for std_logic and std_logic_vector.
>
> Slices with range indices that do not evaluate to constants "
>
> When I search for the term "std_logic" - following is the highlighted snippet displayed
>
> "functions for<em>std_logic</em>  and std_logic_vector. * Slices with range indices that do not evaluate to constants"
>
>
> As you can see the highlighted term does not start from the beginning of sentence, why is this and how can I achieve this.

Hi Shyam,

Can you try to set hl.bs.chars=".!?" and hl.bs.maxScan=100 or larger number.
SimpleBoudaryScanner will scan the stored data to back and forth from the
highlighted terms until meet those setting.

http://wiki.apache.org/solr/HighlightingParameters#hl.bs.maxScan

koji
-- 
http://www.rondhuit.com/en/

RE: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

Posted by Shyam Bhaskaran <Sh...@synopsys.com>.
Hi Koji,

I have tried using hl.bs.type=SENTENCE and still no improvement.

We are storing PDF extracted content in the field which has termVectors enabled.

Example the field contains the following data extracted from PDF 

"User-defined resolution functions. The synthesis tool only supports the
resolution functions for std_logic and std_logic_vector.

Slices with range indices that do not evaluate to constants "

When I search for the term "std_logic" - following is the highlighted snippet displayed

"functions for <em>std_logic</em> and std_logic_vector. * Slices with range indices that do not evaluate to constants"


As you can see the highlighted term does not start from the beginning of sentence, why is this and how can I achieve this.


-Shyam 

Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/02/08 0:50), Shyam Bhaskaran wrote:
> Hi,
>
> We are using Solr 4.0 along with FVH and there is an issue we are facing while highlighting.
> For our requirement we want the highlighted search result should start with the beginning of the sentence and needed help to get this done.
>
> As of now this is not happening and the highlighted output comes up first in most scenarios.
>
> I have tried using the parameter boundaryScanner but still not getting the desired required result.
>
> Below is the configuration we are using.
>
>     <boundaryScanner name="simple" class="solr.highlight.SimpleBoundaryScanner" default="true">
>       <lst name="defaults">
>         <str name="hl.bs.maxScan">10</str>
>         <str name="hl.bs.chars">.,!?&#9;&#10;&#13;</str>
>       </lst>
>     </boundaryScanner>
>
> I need help in getting the display of highlighted search result and it should start with the beginning of the sentence that contains the search string.

Please provide more detail info, e.g. field data that you indexed and undesirable snippet
you currently got.

And have you tried BreakIteratorBoundaryScanner with hl.bs.type=SENTENCE?

koji
-- 
http://www.rondhuit.com/en/