You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Duncan McIntyre <du...@calligram.co.uk> on 2012/06/26 15:31:16 UTC

FastVectorHighlighter failure with multiValued fields

I think I may have identified a bug with FVH. So I have two questions:

1) Does anyone know how to make FVH return a highlighted snippet when the
query matches all of one string in a multivalued field?
2) If not, does anyone know how to make DIH concatenate all the values in a
multivalued field into one single field?

Imagine a document which looks like this:

<doc>
  <str name="department_name">Obstetrics and Gynaecology</str>
  <arr name="node_names">
    <str>Refer to specialist</str>
    <str>Identify adverse psycho social factors</str>
  </arr>
</doc>

If I search the document and ask for matches to be highlighted with the
original highlighter I get 'node_names' in the highlighting results

q=node_names:("Refer to specialist")&hl=true*hl.fl=*

But if I repeat the search using the FVH, 'node_names' does not appear in
the highlighting results

q=node_names:("Refer to
specialist")&hl=true*hl.fl=*&hl.useFastVectorHighlighter=true

A search for something less than the full string (e.g. "Refer to") works in
both cases.

I have tried every combination of hl.requireFieldMatch,
hl.usePhraseHighlighter with no effect.

node_names is defined as either:

<field name="node_names"      type="text_en_splitting" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>


OR:

   <field name="node_names"      type="text_en" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>

And I have tried setting preserveOriginal="1" on the
WordDelimiterFilterFactory.

Now FVH seems to work fine with single-valued fields, so doing a query
q=department_name:("Obstetrics and Gynaecology") works as expected. Given
that, I have tried unsuccessfully to use either a Javascript or native Java
transformer to merge the contents of node_names into a single
node_names_flat field during data import. This fails because child entities
have no access to their parent entity.

<entity name="pathway">
  <entity name="pages">
    <entity name="nodes">
     -- produces multiple node_names and there seems to be no way to push
them up into 'pages' or 'pathway'
    </entity>
  </entity>
</entity>

Duncan.

Re: FastVectorHighlighter failure with multiValued fields

Posted by Lance Norskog <go...@gmail.com>.
I think: text fields are not exactly multi-valued. Instead there is
something called the 'positionIncrementGap' which gives a sweep
(usually 100) of empty positions (terms) to distinguish one field from
the next. If you set this to zero or one, that should give you one
long multi-valued field.

2) You can do anything with javascript in DIH.
http://lucidworks.lucidimagination.com/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheScriptTransformer

On Tue, Jun 26, 2012 at 6:31 AM, Duncan McIntyre <du...@calligram.co.uk> wrote:
> I think I may have identified a bug with FVH. So I have two questions:
>
> 1) Does anyone know how to make FVH return a highlighted snippet when the
> query matches all of one string in a multivalued field?
> 2) If not, does anyone know how to make DIH concatenate all the values in a
> multivalued field into one single field?
>
> Imagine a document which looks like this:
>
> <doc>
>  <str name="department_name">Obstetrics and Gynaecology</str>
>  <arr name="node_names">
>    <str>Refer to specialist</str>
>    <str>Identify adverse psycho social factors</str>
>  </arr>
> </doc>
>
> If I search the document and ask for matches to be highlighted with the
> original highlighter I get 'node_names' in the highlighting results
>
> q=node_names:("Refer to specialist")&hl=true*hl.fl=*
>
> But if I repeat the search using the FVH, 'node_names' does not appear in
> the highlighting results
>
> q=node_names:("Refer to
> specialist")&hl=true*hl.fl=*&hl.useFastVectorHighlighter=true
>
> A search for something less than the full string (e.g. "Refer to") works in
> both cases.
>
> I have tried every combination of hl.requireFieldMatch,
> hl.usePhraseHighlighter with no effect.
>
> node_names is defined as either:
>
> <field name="node_names"      type="text_en_splitting" indexed="true"
> stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>
>
> OR:
>
>   <field name="node_names"      type="text_en" indexed="true"
> stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>
> And I have tried setting preserveOriginal="1" on the
> WordDelimiterFilterFactory.
>
> Now FVH seems to work fine with single-valued fields, so doing a query
> q=department_name:("Obstetrics and Gynaecology") works as expected. Given
> that, I have tried unsuccessfully to use either a Javascript or native Java
> transformer to merge the contents of node_names into a single
> node_names_flat field during data import. This fails because child entities
> have no access to their parent entity.
>
> <entity name="pathway">
>  <entity name="pages">
>    <entity name="nodes">
>     -- produces multiple node_names and there seems to be no way to push
> them up into 'pages' or 'pathway'
>    </entity>
>  </entity>
> </entity>
>
> Duncan.



-- 
Lance Norskog
goksron@gmail.com