You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Benjamin Keil (JIRA)" <ji...@apache.org> on 2009/10/28 22:08:59 UTC

[jira] Created: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

QueryScorer and SpanRegexQuery are incompatible.
------------------------------------------------

                 Key: LUCENE-2013
                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/highlighter
    Affects Versions: 2.9
         Environment: Lucene-Java 2.9
            Reporter: Benjamin Keil


Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:

bq.{{------------------------------------------------------------------------
r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line

LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
------------------------------------------------------------------------}}

This is a great convenience for the most part, but it's causing me difficulties with {{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}} uses {{Query.extractTerms()}} to collect the fields used in the query, but {{SpanRegexQuery}} does not implement this method, so highlighting any query with a {{SpanRegexQuery}} throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of {{SpanRegexQuery}} throwing an exception when someone calls its {{getSpans()}} method.

I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to {{SpanQuery}}: {{extractFields(Set<String> fields)}} which is {{fields.add(getField())}} for everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}} which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}}, and is overridden in each composite {{SpanQuery}} to return a value depending on its components.  In this way {{SpanRegexQuery}} (and any other custom {{SpanQuery}}s) do not need to be adjusted.

Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the {{WeightedSpanTerm}} extraction from a {{SpanQuery}} proceeds in two steps.  First, if the {{QueryScorer}}'s field is {{null}}, then the fields are collected from the {{SpanQuery}} using the {{extractFields()}} method.  Second the terms are collected using {{extractTerms()}}, rewriting the query for each field if {{mustBeRewrittenToGetSpans()}} returns {{true}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772763#action_12772763 ] 

Mark Miller edited comment on LUCENE-2013 at 11/2/09 11:54 PM:
---------------------------------------------------------------

bug alert - I screwed up the backport

      was (Author: markrmiller@gmail.com):
    bug alert
  
> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772782#action_12772782 ] 

Mark Miller edited comment on LUCENE-2013 at 11/3/09 12:18 AM:
---------------------------------------------------------------

Umm - yes. Unless we want to draw a line. Its worse than the bug we fixed. Very sorry - entirely my fault.

      was (Author: markrmiller@gmail.com):
    Umm - yes. Unless we want to draw a line. Its worse than the bug we fixed.
  
> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller resolved LUCENE-2013.
---------------------------------

    Resolution: Fixed

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771532#action_12771532 ] 

Mark Miller commented on LUCENE-2013:
-------------------------------------

No problem - and we can refine if we need to for the next release - I plopped it in now to make sure at least this fix gets into 2.9.1

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2013:
--------------------------------

    Attachment: LUCENE-2013.patch

Here is what I would propose - with additional handling of regular regex queries if needed (havn't check yet).

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2013:
--------------------------------

    Fix Version/s: 3.0

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>             Fix For: 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Benjamin Keil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Keil updated LUCENE-2013:
----------------------------------

    Attachment: lucene-2013-2009-10-28.patch

Patch for LUCENE-2013

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>         Attachments: lucene-2013-2009-10-28.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.{{------------------------------------------------------------------------
> r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> ------------------------------------------------------------------------}}
> This is a great convenience for the most part, but it's causing me difficulties with {{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}} uses {{Query.extractTerms()}} to collect the fields used in the query, but {{SpanRegexQuery}} does not implement this method, so highlighting any query with a {{SpanRegexQuery}} throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of {{SpanRegexQuery}} throwing an exception when someone calls its {{getSpans()}} method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to {{SpanQuery}}: {{extractFields(Set<String> fields)}} which is {{fields.add(getField())}} for everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}} which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}}, and is overridden in each composite {{SpanQuery}} to return a value depending on its components.  In this way {{SpanRegexQuery}} (and any other custom {{SpanQuery}}s) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the {{WeightedSpanTerm}} extraction from a {{SpanQuery}} proceeds in two steps.  First, if the {{QueryScorer}}'s field is {{null}}, then the fields are collected from the {{SpanQuery}} using the {{extractFields()}} method.  Second the terms are collected using {{extractTerms()}}, rewriting the query for each field if {{mustBeRewrittenToGetSpans()}} returns {{true}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771196#action_12771196 ] 

Mark Miller commented on LUCENE-2013:
-------------------------------------

Thanks for the report Benjamin -

Not sure I like adding the methods to SpanQuerys though - 

how about putting a check for regex query before the check for spanquery, and rewriting if we see it? It means adding the contrib with regex as a dependency of the highlighter, but it lets us avoid modifying any core classes.

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Benjamin Keil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771512#action_12771512 ] 

Benjamin Keil commented on LUCENE-2013:
---------------------------------------

Awesome, Mark.  Thank you!

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Benjamin Keil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Keil updated LUCENE-2013:
----------------------------------

    Attachment: lucene-2013-2009-10-29-0136.patch

The problem with that solution is that it doesn't handle any kind of nested span queries.  Let's update the test case to use this query:

public void testSpanRegexQuery() throws Exception {
    query = new SpanOrQuery(new SpanQuery [] {
        new SpanRegexQuery(new Term(FIELD_NAME, "ken.*")) });

I've attached a counter-counter-proposal :) that does not add SpanRegexQuery as a dependency of WeightedSpanTermExtractor (although the test-case still needs it ... does the build have a concept of test-only dependencies?).

It doesn't add any methods to the core classes, but this also means that no third-party queries have any way to influence their highlighting.

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>             Fix For: 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772785#action_12772785 ] 

Michael McCandless commented on LUCENE-2013:
--------------------------------------------

OK, no problem, I'll respin!

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772782#action_12772782 ] 

Mark Miller commented on LUCENE-2013:
-------------------------------------

Umm - yes. Unless we want to draw a line. Its worse than the bug we fixed.

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Benjamin Keil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Keil updated LUCENE-2013:
----------------------------------

    Attachment: lucene-2013-2009-10-28-2135.patch

This patch does a better job of collecting fields from the queries.  The previous patch only had the desired result if there were no FieldMaskingSpanQueries, or if there was just one and it was at the very root of the hierarchy.

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.{{------------------------------------------------------------------------
> r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> ------------------------------------------------------------------------}}
> This is a great convenience for the most part, but it's causing me difficulties with {{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}} uses {{Query.extractTerms()}} to collect the fields used in the query, but {{SpanRegexQuery}} does not implement this method, so highlighting any query with a {{SpanRegexQuery}} throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of {{SpanRegexQuery}} throwing an exception when someone calls its {{getSpans()}} method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to {{SpanQuery}}: {{extractFields(Set<String> fields)}} which is {{fields.add(getField())}} for everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}} which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}}, and is overridden in each composite {{SpanQuery}} to return a value depending on its components.  In this way {{SpanRegexQuery}} (and any other custom {{SpanQuery}}s) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the {{WeightedSpanTerm}} extraction from a {{SpanQuery}} proceeds in two steps.  First, if the {{QueryScorer}}'s field is {{null}}, then the fields are collected from the {{SpanQuery}} using the {{extractFields()}} method.  Second the terms are collected using {{extractTerms()}}, rewriting the query for each field if {{mustBeRewrittenToGetSpans()}} returns {{true}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller reopened LUCENE-2013:
---------------------------------


bug alert

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772778#action_12772778 ] 

Michael McCandless commented on LUCENE-2013:
--------------------------------------------

Mark, does this require a 2.9.1 respin?

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771458#action_12771458 ] 

Mark Miller commented on LUCENE-2013:
-------------------------------------

Nice catch - I think I like this method better than the core modifications.

bq. but this also means that no third-party queries have any way to influence their highlighting.

Unfortunately, I think thats already the deal in many cases. The Highlighter is very special case - ugly, but the current state of things. We will hopefully get away from that eventually.

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>             Fix For: 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller resolved LUCENE-2013.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 2.9.1
         Assignee: Mark Miller

Thanks Benjamin!

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>            Assignee: Mark Miller
>             Fix For: 2.9.1, 3.0
>
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch, lucene-2013-2009-10-29-0136.patch, LUCENE-2013.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.

Posted by "Benjamin Keil (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Keil updated LUCENE-2013:
----------------------------------

    Description: 
Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:

bq.------------------------------------------------------------------------
bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
bq.
bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
bq.------------------------------------------------------------------------

This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.

I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.

Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

  was:
Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:

bq.{{------------------------------------------------------------------------
r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line

LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
------------------------------------------------------------------------}}

This is a great convenience for the most part, but it's causing me difficulties with {{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}} uses {{Query.extractTerms()}} to collect the fields used in the query, but {{SpanRegexQuery}} does not implement this method, so highlighting any query with a {{SpanRegexQuery}} throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of {{SpanRegexQuery}} throwing an exception when someone calls its {{getSpans()}} method.

I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to {{SpanQuery}}: {{extractFields(Set<String> fields)}} which is {{fields.add(getField())}} for everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}} which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}}, and is overridden in each composite {{SpanQuery}} to return a value depending on its components.  In this way {{SpanRegexQuery}} (and any other custom {{SpanQuery}}s) do not need to be adjusted.

Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the {{WeightedSpanTerm}} extraction from a {{SpanQuery}} proceeds in two steps.  First, if the {{QueryScorer}}'s field is {{null}}, then the fields are collected from the {{SpanQuery}} using the {{extractFields()}} method.  Second the terms are collected using {{extractTerms()}}, rewriting the query for each field if {{mustBeRewrittenToGetSpans()}} returns {{true}}.


Removed failed attempts at {{monospace font}} in description.

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>
>                 Key: LUCENE-2013
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2013
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>         Attachments: lucene-2013-2009-10-28-2135.patch, lucene-2013-2009-10-28.patch
>
>
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:
> bq.------------------------------------------------------------------------
> bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> bq.
> bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
> bq.------------------------------------------------------------------------
> This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException.  If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.
> I can provide the patch that I am currently using, but I'm not sure that my solution is optimal.  It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components.  In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.  In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps.  First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method.  Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org