You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Marc Drolet (JIRA)" <ji...@apache.org> on 2011/05/25 19:10:47 UTC

[jira] [Created] (SOLR-2546) Using hl.useFastVectorHighlighter with copyfield multivalued "boosted" we get too much informations

Using hl.useFastVectorHighlighter with copyfield multivalued "boosted" we get too much informations
---------------------------------------------------------------------------------------------------

                 Key: SOLR-2546
                 URL: https://issues.apache.org/jira/browse/SOLR-2546
             Project: Solr
          Issue Type: Bug
          Components: highlighter
    Affects Versions: 4.0
         Environment: running on linux centos distro with tomcat 5 server 
            Reporter: Marc Drolet


I used a copyfield to search on.  "Publisher_text" where I've copied a couple of fields into it.  ex: id, Name, url, email
I've copied 8 time the Name field into that copyfield to add a boost on the Name when I search on that copyfield.

When I search on that copyfield and highlight that field with highlighting on using the useFastTermHighlighter I get the result truncated an the string return is multiplicated ontil the hl.fragsize is reach. default 100.

here is my query for this example:
?q=Publisher_text%3Aedi&start=0&rows=10&fl=Publisher_text&hl=on&hl.fl=Publisher_text&hl.useFastVectorHighlighter=on

here is the result's I have:
<result name="response" numFound="322" start="0">
<doc>
<arr name="Publisher_text">
<str>  </str>
<str/>
<str>Neil Houston neil.houston@rci.rogers.com</str>
<str>jyounes</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media [New]</str>
<str>Rogers Digital Media</str>
<str>1 Mount Pleasant Toronto Canada M4Y 2Y5 Ontario</str>
<str>Corby Fine corby.fine@rci.rogers.com</str>
<str>2262</str>
....

here is the highlighting result I have:
<lst name="highlighting">
<lst name="Publisher_2262">
<arr name="Publisher_text">
<str>
igital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] 
</str>
</arr>
</lst>

You can see that the starting string is truncated.  It's supposed to start with Rodgers .. and it's start at igital.
You can also see that the string is return 4 times when it's supposed to return only once "Rogers Digital<span class="match"> Me</span>dia [New]"
You can also see that the hl.tag.pre and hl.tag.post are not at the right spot.  <span class="match"> Me</span>dia it should be M<span class="match">edi</span>a


here is my schema Publisher_text field description:
 <field name="Publisher_text"    type="text_wild"        indexed="true" stored="true"    multiValued="true"      omitNorms="true" termVectors="true" termPositions="true" termOffsets="true"/>

here is my text_wild field type description:
    <fieldType name="text_wild" class="solr.TextField" >
      <analyzer type="index">
        <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>


When I remove the hl.useFastVectorHighlighter, the query is slower but I get the right result:
<lst name="highlighting">
<lst name="Publisher_2262">
<arr name="Publisher_text">
<str>Rogers Digital<em> Me</em>dia [New]</str>
</arr>
</lst>

I'm running on the nightly build: apache-solr-4.0-2011-05-16_08-24-17-src.tgz

If you need more information, let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Resolved] (SOLR-2546) Using hl.useFastVectorHighlighter with copyfield multivalued "boosted" we get too much informations

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Sekiguchi resolved SOLR-2546.
----------------------------------

    Resolution: Invalid

> Using hl.useFastVectorHighlighter with copyfield multivalued "boosted" we get too much informations
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2546
>                 URL: https://issues.apache.org/jira/browse/SOLR-2546
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.0
>         Environment: running on linux centos distro with tomcat 5 server 
>            Reporter: Marc Drolet
>
> I used a copyfield to search on.  "Publisher_text" where I've copied a couple of fields into it.  ex: id, Name, url, email
> I've copied 8 time the Name field into that copyfield to add a boost on the Name when I search on that copyfield.
> When I search on that copyfield and highlight that field with highlighting on using the useFastTermHighlighter I get the result truncated an the string return is multiplicated ontil the hl.fragsize is reach. default 100.
> here is my query for this example:
> ?q=Publisher_text%3Aedi&start=0&rows=10&fl=Publisher_text&hl=on&hl.fl=Publisher_text&hl.useFastVectorHighlighter=on
> here is the result's I have:
> <result name="response" numFound="322" start="0">
> <doc>
> <arr name="Publisher_text">
> <str>  </str>
> <str/>
> <str>Neil Houston neil.houston@rci.rogers.com</str>
> <str>jyounes</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media</str>
> <str>1 Mount Pleasant Toronto Canada M4Y 2Y5 Ontario</str>
> <str>Corby Fine corby.fine@rci.rogers.com</str>
> <str>2262</str>
> ....
> here is the highlighting result I have:
> <lst name="highlighting">
> <lst name="Publisher_2262">
> <arr name="Publisher_text">
> <str>
> igital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] 
> </str>
> </arr>
> </lst>
> You can see that the starting string is truncated.  It's supposed to start with Rodgers .. and it's start at igital.
> You can also see that the string is return 4 times when it's supposed to return only once "Rogers Digital<span class="match"> Me</span>dia [New]"
> You can also see that the hl.tag.pre and hl.tag.post are not at the right spot.  <span class="match"> Me</span>dia it should be M<span class="match">edi</span>a
> here is my schema Publisher_text field description:
>  <field name="Publisher_text"    type="text_wild"        indexed="true" stored="true"    multiValued="true"      omitNorms="true" termVectors="true" termPositions="true" termOffsets="true"/>
> here is my text_wild field type description:
>     <fieldType name="text_wild" class="solr.TextField" >
>       <analyzer type="index">
>         <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> When I remove the hl.useFastVectorHighlighter, the query is slower but I get the right result:
> <lst name="highlighting">
> <lst name="Publisher_2262">
> <arr name="Publisher_text">
> <str>Rogers Digital<em> Me</em>dia [New]</str>
> </arr>
> </lst>
> I'm running on the nightly build: apache-solr-4.0-2011-05-16_08-24-17-src.tgz
> If you need more information, let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2546) Using hl.useFastVectorHighlighter with copyfield multivalued "boosted" we get too much informations

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040243#comment-13040243 ] 

Koji Sekiguchi commented on SOLR-2546:
--------------------------------------

Please use solr-user mailing list before opening an issue next time.

{code}
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
{code}

Unfortunately, FVH cannot support variable gram size terms, i.e. you should set minGramSize == maxGramSize.


> Using hl.useFastVectorHighlighter with copyfield multivalued "boosted" we get too much informations
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2546
>                 URL: https://issues.apache.org/jira/browse/SOLR-2546
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.0
>         Environment: running on linux centos distro with tomcat 5 server 
>            Reporter: Marc Drolet
>
> I used a copyfield to search on.  "Publisher_text" where I've copied a couple of fields into it.  ex: id, Name, url, email
> I've copied 8 time the Name field into that copyfield to add a boost on the Name when I search on that copyfield.
> When I search on that copyfield and highlight that field with highlighting on using the useFastTermHighlighter I get the result truncated an the string return is multiplicated ontil the hl.fragsize is reach. default 100.
> here is my query for this example:
> ?q=Publisher_text%3Aedi&start=0&rows=10&fl=Publisher_text&hl=on&hl.fl=Publisher_text&hl.useFastVectorHighlighter=on
> here is the result's I have:
> <result name="response" numFound="322" start="0">
> <doc>
> <arr name="Publisher_text">
> <str>  </str>
> <str/>
> <str>Neil Houston neil.houston@rci.rogers.com</str>
> <str>jyounes</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media [New]</str>
> <str>Rogers Digital Media</str>
> <str>1 Mount Pleasant Toronto Canada M4Y 2Y5 Ontario</str>
> <str>Corby Fine corby.fine@rci.rogers.com</str>
> <str>2262</str>
> ....
> here is the highlighting result I have:
> <lst name="highlighting">
> <lst name="Publisher_2262">
> <arr name="Publisher_text">
> <str>
> igital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] Rogers Digital<span class="match"> Me</span>dia [New] 
> </str>
> </arr>
> </lst>
> You can see that the starting string is truncated.  It's supposed to start with Rodgers .. and it's start at igital.
> You can also see that the string is return 4 times when it's supposed to return only once "Rogers Digital<span class="match"> Me</span>dia [New]"
> You can also see that the hl.tag.pre and hl.tag.post are not at the right spot.  <span class="match"> Me</span>dia it should be M<span class="match">edi</span>a
> here is my schema Publisher_text field description:
>  <field name="Publisher_text"    type="text_wild"        indexed="true" stored="true"    multiValued="true"      omitNorms="true" termVectors="true" termPositions="true" termOffsets="true"/>
> here is my text_wild field type description:
>     <fieldType name="text_wild" class="solr.TextField" >
>       <analyzer type="index">
>         <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> When I remove the hl.useFastVectorHighlighter, the query is slower but I get the right result:
> <lst name="highlighting">
> <lst name="Publisher_2262">
> <arr name="Publisher_text">
> <str>Rogers Digital<em> Me</em>dia [New]</str>
> </arr>
> </lst>
> I'm running on the nightly build: apache-solr-4.0-2011-05-16_08-24-17-src.tgz
> If you need more information, let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org