You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Martin Frank Hansen (MHQ)" <MH...@kmd.dk> on 2019/07/01 10:54:40 UTC

RE: highlighting not working as expected

Hi Edwin,

Thanks for your explanation, makes sense now.

Best regards

Martin


Internal - KMD A/S

-----Original Message-----
From: Zheng Lin Edwin Yeo <ed...@gmail.com>
Sent: 30. juni 2019 01:57
To: solr-user@lucene.apache.org
Subject: Re: highlighting not working as expected

Hi,

If you are using the type "string", it will require exact match, including space and upper/lower case.

You can use the type "text" for a start, but further down the road it will be good to have your own custom fieldType with your own tokenizer and filter.

Regards,
Edwin

On Tue, 25 Jun 2019 at 14:52, Martin Frank Hansen (MHQ) <MH...@kmd.dk> wrote:

> Hi again,
>
> I have tested a bit and I was wondering if the highlighter requires a
> field to be of type "text"? Whenever I try highlighting on fields
> which are of type "string" nothing gets returned.
>
> Best regards
>
> Martin
>
>
> Internal - KMD A/S
>
> -----Original Message-----
> From: Jörn Franke <jo...@gmail.com>
> Sent: 11. juni 2019 08:45
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting not working as expected
>
> Could it be a stop word ? What is the exact type definition of those
> fields? Could this word be omitted or with wrong encoding during
> loading of the documents?
>
> > Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) <MH...@kmd.dk>:
> >
> > Hi,
> >
> > I am having some difficulties making highlighting work. For some
> > reason
> the highlighting feature only works on some fields but not on other
> fields even though these fields are stored.
> >
> > An example of a request looks like this:
> http://localhost/solr/mytest/select?fl=id,doc.Type,Journalnummer,Sagst
> itel&hl.fl=Sagstitel&hl.simple.post=%3C/b%3E&hl.simple.pre=%3Cb%3E&hl=
> on&q=rotte
> >
> > It simply returns an empty set, for all documents even though I can
> > see
> several documents which have “Sagstitel” containing the word “rotte”
> (rotte=rat).  What am I missing here?
> >
> > I am using the standard highlighter as below.
> >
> >
> > <searchComponent class="solr.HighlightComponent" name="highlight">
> >    <highlighting>
> >      <!-- Configure the standard fragmenter -->
> >      <!-- This could most likely be commented out in the "default"
> > case
> -->
> >      <fragmenter name="gap"
> >                  default="true"
> >                  class="solr.highlight.GapFragmenter">
> >        <lst name="defaults">
> >          <int name="hl.fragsize">100</int>
> >        </lst>
> >      </fragmenter>
> >
> >      <!-- A regular-expression-based fragmenter
> >           (for sentence extraction)
> >        -->
> >      <fragmenter name="regex"
> >                  class="solr.highlight.RegexFragmenter">
> >        <lst name="defaults">
> >          <!-- slightly smaller fragsizes work better because of slop -->
> >          <int name="hl.fragsize">70</int>
> >          <!-- allow 50% slop on fragment sizes -->
> >          <float name="hl.regex.slop">0.5</float>
> >          <!-- a basic sentence pattern -->
> >          <str name="hl.regex.pattern">[-\w
> ,/\n\&quot;&apos;]{20,200}</str>
> >        </lst>
> >      </fragmenter>
> >
> >      <!-- Configure the standard formatter -->
> >      <formatter name="html"
> >                 default="true"
> >                 class="solr.highlight.HtmlFormatter">
> >        <lst name="defaults">
> >          <str name="hl.simple.pre">&lt;b&gt;</str>
> >          <str name="hl.simple.post">&lt;/b&gt;</str>
> >        </lst>
> >      </formatter>
> >
> >      <!-- Configure the standard encoder -->
> >      <encoder name="html"
> >               class="solr.highlight.HtmlEncoder" />
> >
> >      <!-- Configure the standard fragListBuilder -->
> >      <fragListBuilder name="simple"
> >                       class="solr.highlight.SimpleFragListBuilder"/>
> >
> >      <!-- Configure the single fragListBuilder -->
> >      <fragListBuilder name="single"
> >                       class="solr.highlight.SingleFragListBuilder"/>
> >
> >      <!-- Configure the weighted fragListBuilder -->
> >     <fragListBuilder name="weighted"
> >                       default="true"
> >
> > class="solr.highlight.WeightedFragListBuilder"/>
> >
> >      <!-- default tag FragmentsBuilder -->
> >      <fragmentsBuilder name="default"
> >                        default="true"
> >                        class="solr.highlight.ScoreOrderFragmentsBuilder">
> >        <!--
> >        <lst name="defaults">
> >          <str name="hl.multiValuedSeparatorChar">/</str>
> >        </lst>
> >        -->
> >      </fragmentsBuilder>
> >
> >      <!-- multi-colored tag FragmentsBuilder -->
> >      <fragmentsBuilder name="colored"
> >                        class="solr.highlight.ScoreOrderFragmentsBuilder">
> >        <lst name="defaults">
> >          <str name="hl.tag.pre"><![CDATA[
> >               <b style="background:yellow">,<b
> style="background:lawgreen">,
> >               <b style="background:aquamarine">,<b
> style="background:magenta">,
> >               <b style="background:palegreen">,<b
> style="background:coral">,
> >               <b style="background:wheat">,<b style="background:khaki">,
> >               <b style="background:lime">,<b
> style="background:deepskyblue">]]></str>
> >          <str name="hl.tag.post"><![CDATA[</b>]]></str>
> >        </lst>
> >      </fragmentsBuilder>
> >
> >      <boundaryScanner name="default"
> >                       default="true"
> >                       class="solr.highlight.SimpleBoundaryScanner">
> >        <lst name="defaults">
> >          <str name="hl.bs.maxScan">10</str>
> >          <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
> >        </lst>
> >      </boundaryScanner>
> >
> >      <boundaryScanner name="breakIterator"
> >
>  class="solr.highlight.BreakIteratorBoundaryScanner">
> >        <lst name="defaults">
> >          <!-- type should be one of CHARACTER, WORD(default), LINE
> > and
> SENTENCE -->
> >          <str name="hl.bs.type">WORD</str>
> >          <!-- language and country are used when constructing Locale
> object.  -->
> >          <!-- And the Locale object will be used when getting
> > instance
> of BreakIterator -->
> >          <str name="hl.bs.language">da</str>
> >        </lst>
> >      </boundaryScanner>
> >    </highlighting>
> >  </searchComponent>
> >
> > Hope that some one can help, thanks in advance.
> >
> > Best regards
> > Martin
> >
> >
> >
> > Internal - KMD A/S
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> > finder
> du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der
> fortæller, hvordan vi behandler oplysninger om dig.
> >
> > Protection of your personal data is important to us. Here you can
> > read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how
> we process your personal data.
> >
> > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig
> information. Hvis du ved en fejltagelse modtager e-mailen, beder vi
> dig venligst informere afsender om fejlen ved at bruge svarfunktionen.
> Samtidig beder vi dig slette e-mailen i dit system uden at videresende
> eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter
> vores overbevisning er fri for virus og andre fejl, som kan påvirke
> computeren eller it-systemet, hvori den modtages og læses, åbnes den
> på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og
> skade, som er opstået i forbindelse med at modtage og bruge e-mailen.
> >
> > Please note that this message may contain confidential information.
> > If
> you have received this message by mistake, please inform the sender of
> the mistake by sending a reply, then delete the message from your
> system without making, distributing or retaining any copies of it.
> Although we believe that the message and any attachments are free from
> viruses and other errors that might affect the computer or it-system
> where it is received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the
> receipt or use of this message.
>