You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Duncan McIntyre <du...@calligram.co.uk> on 2012/06/26 15:31:16 UTC
FastVectorHighlighter failure with multiValued fields
I think I may have identified a bug with FVH. So I have two questions:
1) Does anyone know how to make FVH return a highlighted snippet when the
query matches all of one string in a multivalued field?
2) If not, does anyone know how to make DIH concatenate all the values in a
multivalued field into one single field?
Imagine a document which looks like this:
<doc>
<str name="department_name">Obstetrics and Gynaecology</str>
<arr name="node_names">
<str>Refer to specialist</str>
<str>Identify adverse psycho social factors</str>
</arr>
</doc>
If I search the document and ask for matches to be highlighted with the
original highlighter I get 'node_names' in the highlighting results
q=node_names:("Refer to specialist")&hl=true*hl.fl=*
But if I repeat the search using the FVH, 'node_names' does not appear in
the highlighting results
q=node_names:("Refer to
specialist")&hl=true*hl.fl=*&hl.useFastVectorHighlighter=true
A search for something less than the full string (e.g. "Refer to") works in
both cases.
I have tried every combination of hl.requireFieldMatch,
hl.usePhraseHighlighter with no effect.
node_names is defined as either:
<field name="node_names" type="text_en_splitting" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>
OR:
<field name="node_names" type="text_en" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>
And I have tried setting preserveOriginal="1" on the
WordDelimiterFilterFactory.
Now FVH seems to work fine with single-valued fields, so doing a query
q=department_name:("Obstetrics and Gynaecology") works as expected. Given
that, I have tried unsuccessfully to use either a Javascript or native Java
transformer to merge the contents of node_names into a single
node_names_flat field during data import. This fails because child entities
have no access to their parent entity.
<entity name="pathway">
<entity name="pages">
<entity name="nodes">
-- produces multiple node_names and there seems to be no way to push
them up into 'pages' or 'pathway'
</entity>
</entity>
</entity>
Duncan.
Re: FastVectorHighlighter failure with multiValued fields
Posted by Lance Norskog <go...@gmail.com>.
I think: text fields are not exactly multi-valued. Instead there is
something called the 'positionIncrementGap' which gives a sweep
(usually 100) of empty positions (terms) to distinguish one field from
the next. If you set this to zero or one, that should give you one
long multi-valued field.
2) You can do anything with javascript in DIH.
http://lucidworks.lucidimagination.com/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheScriptTransformer
On Tue, Jun 26, 2012 at 6:31 AM, Duncan McIntyre <du...@calligram.co.uk> wrote:
> I think I may have identified a bug with FVH. So I have two questions:
>
> 1) Does anyone know how to make FVH return a highlighted snippet when the
> query matches all of one string in a multivalued field?
> 2) If not, does anyone know how to make DIH concatenate all the values in a
> multivalued field into one single field?
>
> Imagine a document which looks like this:
>
> <doc>
> <str name="department_name">Obstetrics and Gynaecology</str>
> <arr name="node_names">
> <str>Refer to specialist</str>
> <str>Identify adverse psycho social factors</str>
> </arr>
> </doc>
>
> If I search the document and ask for matches to be highlighted with the
> original highlighter I get 'node_names' in the highlighting results
>
> q=node_names:("Refer to specialist")&hl=true*hl.fl=*
>
> But if I repeat the search using the FVH, 'node_names' does not appear in
> the highlighting results
>
> q=node_names:("Refer to
> specialist")&hl=true*hl.fl=*&hl.useFastVectorHighlighter=true
>
> A search for something less than the full string (e.g. "Refer to") works in
> both cases.
>
> I have tried every combination of hl.requireFieldMatch,
> hl.usePhraseHighlighter with no effect.
>
> node_names is defined as either:
>
> <field name="node_names" type="text_en_splitting" indexed="true"
> stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>
>
> OR:
>
> <field name="node_names" type="text_en" indexed="true"
> stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>
> And I have tried setting preserveOriginal="1" on the
> WordDelimiterFilterFactory.
>
> Now FVH seems to work fine with single-valued fields, so doing a query
> q=department_name:("Obstetrics and Gynaecology") works as expected. Given
> that, I have tried unsuccessfully to use either a Javascript or native Java
> transformer to merge the contents of node_names into a single
> node_names_flat field during data import. This fails because child entities
> have no access to their parent entity.
>
> <entity name="pathway">
> <entity name="pages">
> <entity name="nodes">
> -- produces multiple node_names and there seems to be no way to push
> them up into 'pages' or 'pathway'
> </entity>
> </entity>
> </entity>
>
> Duncan.
--
Lance Norskog
goksron@gmail.com