You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Hickman <pe...@semantico.com> on 2008/04/28 11:53:24 UTC

Truncated highlighted results

I'm using 1.2 and things are fine for the most part except for an occasional
problem.

When returning a highlighted version of the results the output from solr is
truncated. In most cases the whole of the field (called "content") is
returned highlighted correctly. For example with this query in the from the
solr admin "(content:pain) AND (id:27-ss3)" with "Enable Highlighting" on
and content as the "Fields to Highlight" returns:

<response>

<lst name="responseHeader">
 <int name="status">0</int>
 <int name="QTime">63</int>
 <lst name="params">
  <str name="wt">standard</str>
  <str name="rows">10</str>
  <str name="explainOther"/>
  <str name="start">0</str>
  <str name="hl.fl">content</str>
  <str name="indent">on</str>
  <str name="fl">*,score</str>
  <str name="hl">on</str>
  <str name="q">(content:pain) AND (id:27-ss3)</str>
  <str name="debugQuery">on</str>
  <str name="qt">standard</str>
  <str name="version">2.2</str>
 </lst>
</lst>
<result name="response" numFound="1" start="0" maxScore="6.8991137">
 <doc>
  <float name="score">6.8991137</float>
  <str name="chapterNumber">27</str>
  <str name="chapterTitle">Pain management and assessment</str>
  <str name="content">

... This is the full document and has all the data

  </srt>
  <str name="displayType">chapter</str>
  <str name="documentId">ss3</str>
  <str name="documentTitle">Reference material</str>
  <arr name="documentType"><str>chapter</str></arr>
  <str name="filename">9781405169998_chapter_27.xml</str>
  <bool name="firstSection">false</bool>
  <str name="id">27-ss3</str>
  <str name="sectionId">ss3</str>
  <bool name="userGenerated">false</bool>
 </doc>
</result>
<lst name="debug">
 <str name="rawquerystring">(content:pain) AND (id:27-ss3)</str>
 <str name="querystring">(content:pain) AND (id:27-ss3)</str>
 <str name="parsedquery">+content:pain +id:27-ss3</str>
 <str name="parsedquery_toString">+content:pain +id:27-ss3</str>
 <lst name="explain">
  <str name="id=27-ss3,internal_docid=1895">
6.8991137 = (MATCH) sum of:
  0.14742894 = (MATCH) weight(content:pain in 1895), product of:
    0.30801868 = queryWeight(content:pain), product of:
      2.2976341 = idf(docFreq=606)
      0.13405907 = queryNorm
    0.47863635 = (MATCH) fieldWeight(content:pain in 1895), product of:
      17.776388 = tf(termFreq(content:pain)=316)
      2.2976341 = idf(docFreq=606)
      0.01171875 = fieldNorm(field=content, doc=1895)
  6.7516847 = (MATCH) weight(id:27-ss3 in 1895), product of:
    0.9513804 = queryWeight(id:27-ss3), product of:
      7.096725 = idf(docFreq=4)
      0.13405907 = queryNorm
    7.096725 = (MATCH) fieldWeight(id:27-ss3 in 1895), product of:
      1.0 = tf(termFreq(id:27-ss3)=1)
      7.096725 = idf(docFreq=4)
      1.0 = fieldNorm(field=id, doc=1895)
</str>
 </lst>
</lst>
<lst name="highlighting">
 <lst name="27-ss3">
  <arr name="content">

... This is the highlighted version of /response/result/content and starts
out ok. But after some output it just stops.

  </arr>
 </lst>
</lst>
</response>

I've played with the hl.fragsize parameter and the output responds to the
value when it is low but when it is set to 0 or some large number (say
80000) it always truncates at the same point. The output seems to be around
64500 bytes. I've tried removing data from the front of the document,
reindexed and restarted and the truncation point just moves along the
document by the same amount.

I've checked the data and there aren't some stupid characters at the point
of truncation.

Is there some upper limit for hl.fragsize, say around 64k, that kicks in
when it is 0 or some over large value?

-- 
View this message in context: http://www.nabble.com/Truncated-highlighted-results-tp16935665p16935665.html
Sent from the Solr - User mailing list archive at Nabble.com.