You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2007/02/03 13:49:05 UTC

[jira] Created: (LUCENE-794) Beginnings of a span based highlighter

Beginnings of a span based highlighter
--------------------------------------

                 Key: LUCENE-794
                 URL: https://issues.apache.org/jira/browse/LUCENE-794
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Other
         Environment: There are prob a few Java 1.5 requirements (generics) that could easily be removed.
            Reporter: Mark Miller
            Priority: Minor


This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: SpanHighlighterTest.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter11.patch

Thanks a lot Andy. As I suspected, the issue is that the conversion from PhraseQuery to SpanQuery is inexact. I have updated the code to handle this case though. If a PhraseQuery has 0 slop then the created Span query will now force an inorder match. This should be a nice improvement to the PhraseQuery to SpanQuery approximation.

Patch with fix and new junit test attached.

patch 11

- Mark

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474033 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Howdy Mark H, I have not got into making new SpanQuery tests yet, but at this point I could use some help/guidance. All of the original highlighter tests are passing with the new SpanScorer except for two: 

1. testFieldSpecificHighlighting

This will not pass the second assertion (ignore fields) because when i add the TokenStream to a MemoryIndex I have to add it to a field. I am stumped on getting around this one.

2. testOverlapAnalyzer2

Passes the first bunch but then fails on one. This is because I am looking up terms based on position since the Spans do not return the term text. The first assertion failing is looking for 'hi-<b>speed</b>' but finds '<b>hi-speed</b>' because both 'speed' and 'hi-speed' are at position 0...consequently both score a 1. Any thoughts? I was thinking about gathering all possible terms in the SpanQueryExtractor and someone using them...

Beyond that, I am sure you can find plenty of other things to point out . Have at me <g>

Any ideas on scoring would be appreciated as well.

Feel free to run with this on your own if you have time as well...or run with it a bit and pass it back, or just provide some guidance as I go...whatever works out best for you.

- Mark M

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanScorer.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546236 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Michael: I would love to take a look. I've got the code you sent me and I will go through it soon.

Mark: That is an issue that should probably be cleaned up. A lot of tests are shared, the new SpanScorer just requires some different, odd,  setup that made it easier to copy and change the test file.  I will spend some time trying to combine them into one test file to avoid the overlap.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473578 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Using an Analyzer that produces multiple tokens at the same position does not yet operate correctly if used at query time. Using such a 'synonym' analyzer for indexing and a non 'synonym' analyzer for searching will work fine.

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter2.patch

Patch version 2

Changed to correct build.xml, removed some unneeded code

Has been working well for me personally, could still use some additional span highlighting tests


> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501682 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

I plan on one more release and than I am finished.

I need to optimize the scoring (stop looking at positions for terms that are not position sensitive)

Make a couple unit tests to check for a bug I suspect

Turn the javadoc's into something I am actually proud of.

I would wait for this final patch before taking a look at this Mark H.

I apologize for being so incremental on this issue...lesson learned.

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: HighlighterTest.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: SpanHighlighterTest.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanScorer.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Sean O'Connor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487847 ] 

Sean O'Connor commented on LUCENE-794:
--------------------------------------

I was able to apply the spanhighlighter5.patch. I'm inexperienced with ant and svn, so I assume the slight troubles I had were self-inflicted; I mention them in case they are of any help.

I might have missed something, but my MemoryIndex.java seemed to be missing the implementation of the abstract isPayloadAvailable() method from TermPositions. That was causing my build to fail, so I added the method, simply returning false.

After that change, the tests run, and life was good again. I do get a failed test at org.apache.lucene.search.highlight.HighlighterTest.testGetRangeFragments(HighlighterTest.java:137), but it looks like that might be expected. The search is "[kannedy TO kznnedy]".

I am now looking into getting the total number of hits for a given query (for un-normalized scoring), and the hit positions (saved for larger scale analysis and browsing). I have code that does this, but hope I can improve on my existing approach by using this highlighting patch.
Thanks,

Sean


> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Steven Rowe (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593032#action_12593032 ] 

Steven Rowe commented on LUCENE-794:
------------------------------------

Hi Maurizio, 

SpanHighlighter-02-10-2008.patch should contain everything - start again with a clean checkout and apply only this patch.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Paul Elschot (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571088#action_12571088 ] 

Paul Elschot commented on LUCENE-794:
-------------------------------------

One way to solve the problem of many terms in a range or a prefix query is by indexing terms in a hierarchy of prefixes, for example for a date CCYYMMDD can be indexed as all of C, CC, CCY, CCYY, CCYYMM, CCYYMMD and CCYYMMDD on the same position.
Then for a range and prefix queries the query analyzer can construct an OR over as few terms as possible.
Query search and highlighting would work faster and still correct as they are based on the term positions.


> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Description: 
This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

There is a dependency on MemoryIndex.

  was:This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.


> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, HighlighterTest.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: HighlighterTest.java
                QuerySpansExtractor.java
                Highlighter.java

Updated code to address deficiency in highlighting BooleanQueries.

Use the following latest classes:

CachedTokenStream
DefaultEncoder
Encoder
Formatter
Highlighter
QuerySpansExtractor
SimpleFormatter
HighlighterTest

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475316 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Yeah the patch should take care of all of that...I would have started with a patch, but this was literally my first and it took me a bit to figure it all out, especially with eclipses subclipse plugin using absolute paths instead of relative in the patch..then I was trying forever to add a new package before finding out I can't add a folder to a patch...but now that I got it all worked out it should make life much easier for anyone trying this <g> I will use patches from now on.

Thanks for the build.xml info and for taking a look.

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480175 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>At a minimum, the Term fields could be set back to their original value after doing the Span search..

Hmm. If the query is being reused in a multi-threaded server environment this wouldn't fly.

>>I really don't see how it is possible to ignore fields in another way though

I can think of one. Your current approach is based on modifying the query to suit the MemoryIndex content. Another approach may be to modify the MemoryIndex content to suit the query. Your code creates a MemoryIndex when presented with the text of a field. If it recognised it was being used in "field-insensitive mode" it could extract the query terms and create a MemoryIndex field for each unique fieldname in the set of query terms - using the same source text (a CachedTokenStreamAnalyzer  could be used to avoid excessive tokenization of this text)
This approach would of course use some more memory but avoids the unpleasantness of changing Query objects' contents.
I haven't fully considered the implications of this idea yet - initial thoughts?

Cheers
Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Summary: Extend contrib Highlighter to properly support phrase queries and span queries  (was: SpanScorer and SimpleSpanFragmenter for Contrib Highlighter)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480222 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>How can I ignore fields in a SpanQuery. Now it hits me, rather embarrassingly, such a SpanQuery doesn't make sense at all. 

Just to make sure we're talking about the same thing. Yes, I too came to the obvious realisation that a single SpanQuery cannot test content from more than one field - but I don't think that is something we were trying to support here. The requirement (as I understand it) is to support a scenario where a SpanQuery  was testing only one field, say the "body" field and yet the user wanted to see any matches that just so happened to occur in another field, say the "title" field. Nowhere in the query was there a suggestion of any criteria mandatory or otherwise testing the "title" field - the user just wanted to highlight the title field for additional decoration.
In this scenario we have the challenge of taking the "body" query terms and using them to highlight "title" field content. A "match" would have to disregard the original choice of field name but would still require that  the positions of term text adhered to the SpanQuery logic.

Hope this makes sense

Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474035 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

By the way...I apologize the file list is so messy now.

You just need:

SpanScorer
SpanQueryExtractor
CachedTokenStream
SpanHighlighterTest

and there is the dependency on MemoryIndex

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanScorer.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: MemoryIndex.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: HighlighterTest.java
                Highlighter.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470090 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Looks like a good start, Mark - thanks for contributing this!

I've had a quick play and have identified the following issues:

1) Fieldname "contents" shouldn't be hardcoded into the Highlighter - different analyzers can behave differently for different fields (see PerFieldAnalyzerWrapper). Either pass a fieldname parameter or do as the existing highlighter does and take a TokenStream. The latter approach has the advantage of being able to avoid re-analysis and make use of any stored TermVectors (see TokenSources.java)
2) Analyzers which produce overlapping tokens (see Synonym analyzer in existing highlighter Junit test) are problematic in the existing code. I remember the "TokenGroup" class in the existing highlighter was an approach to help cater for these "overlap" scenarios.
3) Without wishing to resurrect the whole 1.4 vs 1.5 debate I beleive Lucene still targets Java 1.4. 

To rectify these points it's not clear to me if it would be quicker to use your code or adapt the existing highlighter code to use spans.
Thoughts?

Thanks, again,
Mark





 

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, HighlighterTest.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590841#action_12590841 ] 

Otis Gospodnetic commented on LUCENE-794:
-----------------------------------------

Hey, is this 14-month old (impressive persistence, MM) piece of work ready to be committed?

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515911 ] 

Otis Gospodnetic commented on LUCENE-794:
-----------------------------------------

Mark, wow, long list of files up there.  I can't tell which ones are still relevant.  Ah, only  spanhighlighter9.patch, right?

It looks like all files in that patch are new files, that is, this is a parallel highlighter implementation - we can leave the old one in there and commit yours without worrying about breaking the old one.  Could you add Apache license headers to all files, switch to 2 spaces for indentation, and then I think this can get committed?

Oh, and since contrib can be java 1.5+, I think you can use StringBuilder instead of StringBuffer, etc.


> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: QuerySpansExtractor.java
                SpanScorer.java
                CachedTokenStream.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanScorer.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546181 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Makes sense to commit it to me.
I want to spend some time reviewing this in more detail once I'm through with contributing the new web-based version of Luke.
At a quick glance, does the new Junit test in this patch encompass both old and new Highlighter tests? In which case should we remove the old Junit test if they overlap?


> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570849#action_12570849 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Actually is it worth trying to resolve this ConstantScoreRangeQuery issue?

1) A large range can produce a lot of terms - extracting these would bloat memory and slow down highlighting. 
2) The sorts of "quantity" fields that are subject to ranges (prices, dates, lat-lon coordinates) don't typically need highlighting anyway because:
    a) range criteria is normally mandatory (so ALL results are expected to match the range and highlighting matches is unnecessary)
    b) Quantities are normally held in dedicated fields with only one value. Unlike free-text fields there's no need for the user's eye to scan large amounts of information looking for the "hit" so, again highlighting/summarising is generally less useful.

Given the unavoidable performance overhead this introduces and the sneaking suspicion that it's not useful anyway is this worth supporting?

Would be keen to know what the scenario was that introduced this as a requirement.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: MemoryIndex.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter12.patch

Nice little addition courtesy of Michael Goddard:

"...In our Lucene work, we took the approach of indexing all fields into a single field, "FULLTEXT", which is the default field for queries.  Our query syntax is such that a user can combine clauses against named fields with clauses with no field specified.  When we go to highlight such queries, if a given clause is against this FULLTEXT field but we're highlighting text in the TITLE field, we'd still like for matching terms to be highlighted..."

Thanks for the patch Micahael.

There is a new constructor that allows you to specify a default field. Terms from this field will be highlighted regardless of the specific field you are highlighting.

Only file to worry about in that huge mess of files listed above is spanhighlighter12.patch.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter.patch

Forget all the .java files...just get spanhighlighter.patch and apply to the trunk.

Still looking for pointers on how to handle to build.xml

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter5.patch

Apologize for the delay on this -- I was pulled into a busy product launch.

This adds the final piece, replacing TermModifer with multiple Memory Indexes.

I also did a little refactoring, especially in the SpansExtractor.

All tests now pass and I have been using this succesfully for some time now.

For anyone new following this issue, ignore all of the files except for this one: spanhighlighter5.patch

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter8.patch


patch version 8 : Apply to root dir of trunk

- Fixed a bug that was caused when a query had the same term multiple times
- Added a unit test for the bug just mentioned
- Improved performance by not converting Querys that are not position sensitive to SpanQuerys. Non position sensitive Query clauses are treated the same way the standard Scorer would treat them.
- Some refactoring based on the previous change.
- Improved some of the JavaDoc comments

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter7.patch

Minor update to straighten a few things out.

- Replaced custom CachingTokenStream with Lucene's CachingTokenFilter
- Some refactoring in the SpanExtractor (now WeightedSpanTermExtractor)
- Updated some stale JavaDoc

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Tavi Nathanson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611453#action_12611453 ] 

Tavi Nathanson commented on LUCENE-794:
---------------------------------------

Hey everyone,

I'm having some trouble getting SpanScorer to act the way I'd like for proper highlighting, and I'm wondering if anyone has any suggestions.

I have two fields: text_raw and text_stemmed. text_raw, as the name suggests, stores unstemmed (tokenized) text while text_stemmed stores stemmed (tokenized) text.

I have queries that look over both fields. For, example, I may have the query +(text_raw:"apple sauce" text_stemmed:orange). This query matches "apple sauce oranges" but it does not match "apples sauces orange" (because "apple sauce" is not stemmed). I'd like to be able to highlight accordingly: I want "apple," "sauce," and "oranges" to all be highlighted.

So, even though it is in fact the raw text that ends up getting highlighted, I'm looking for a way to build SpanScorer such that I don't need to limit myself to one field ("field" is one of the arguments to the constructor).

Thanks!

Tavi


> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Andy Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527134 ] 

Andy Liu commented on LUCENE-794:
---------------------------------

Ah, I wasn't crazy.  I had the test data wrong.  Here's the code I'm using to produce the failing result:

        String text = "y z x y z a b";

        Analyzer analyzer = new StandardAnalyzer();
        QueryParser parser = new QueryParser("body", analyzer);
        Query query = parser.parse("\"x y z\"");
        
        CachingTokenFilter tokenStream = new CachingTokenFilter(analyzer.tokenStream("body", new StringReader(text)));
        Highlighter highlighter = new Highlighter(new SpanScorer(query, "body", tokenStream));
        highlighter.setTextFragmenter(new NullFragmenter());
        tokenStream.reset();

        String result = highlighter.getBestFragments(tokenStream, text, 1, "...");
        System.out.println(result);

This produces:

<B>y</B> <B>z</B> <B>x</B> <B>y</B> <B>z</B> a b

The beginning y and z shouldn't be highlighted.

If I change the the beginning y and z to x and y, I get the correct result:

"x y x y z a b" => x y <B>x</B> <B>y</B> <B>z</B> a b

Here's a couple other failing results:

"z x y z a b" => <B>z</B> <B>x</B> <B>y</B> <B>z</B> a b
"z a x y z a b" => <B>z</B> a <B>x</B> <B>y</B> <B>z</B> a b

FYI, I'm using the latest version of Lucene.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Maurizio (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593068#action_12593068 ] 

Maurizio commented on LUCENE-794:
---------------------------------

@Steven
thanks a lot, now it's working
@Brian
looking for lucene-memory-x.x.x.jar

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: SpanHighlighter-RemovSysOut.patch

Here is a kill on the System.out.

I should make a new issue right?

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Benjamin Keil (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Keil updated LUCENE-794:
---------------------------------

    Attachment: MultiPhraseQueryExtraction.patch

This is a patch that applies on top of  SpanHighlighter-01-28-2008.patch in order to highlight MutliPhraseQueries

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: WeightedSpanTerm.java
                SpanScorer.java
                QuerySpansExtractor.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: Formatter.java
                Encoder.java
                DefaultEncoder.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: SpanScorer.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by Grant Ingersoll <gs...@apache.org>.

On Jan 14, 2008, at 4:49 PM, Mark Miller wrote:

> While the overall framework of LUCENE-663 appears similar to the  
> current contrib Highlighter, the code is actually quite different  
> and I do not think it handles as many corner cases in its current  
> state. LUCENE-663 supports PhraseQuerys by implementing 'special'  
> search logic that inspects positional information to make sure the  
> Tokens from a PhraseQuery are in order. I am not sure how exact this  
> logic is compared to Lucenes PhraseQuery search logic, but a cursory  
> look makes me think its not complete. It almost looks to me that it  
> only does inorder with simple slop (not edit distance)...I am too  
> lazy to check further though and I may have missed something. Also,  
> LUCENE-663 does not support Span queries.
>
> This patch differs in that it fits the current Highlighter framework  
> without modifying it, and it uses Lucene's own internal search logic  
> to identify Spans for highlighting. PhraseQueries are handled by a  
> SpanQuery approximation.
>
> As far as PhraseQuery/SpanQuery highlighting, I don't think any of  
> the other Highlighter packages offer much. I think that things could  
> be done a little faster, but that would require abandoning the  
> current framework, and with all of the corner cases it now supports,  
> I'd hate to see that.
>
> The other Highlighter code that is worth consideration is  
> LUCENE-644. It does abandon the current Highlighter framework and  
> goes with an attack that is much more efficient for larger  
> documents: instead of attacking the problem by spinning through all  
> of the document tokens and comparing to query tokens, 644 just looks  
> at the tokens from the query and grabs the original text using the  
> offsets from those tokens. This is darn fast, but doesnt go well  
> with positional highlighting and I wonder how well it supports all  
> of the corner cases that arise with overlapping tokens and whatnot.

Hmm, I'm beginning to think that the performance issue may be overcome  
to some extent with the new TermVectorMapper stuff.  Basic idea is  
that you construct a highlighter that does the appropriate  
highlighting as the TV is being loaded from disk through the Map  
function.  This would save having to go back through all the tokens a  
second time, but probably has other issues.  It's just a thought in my  
head at this point.  At a minimum, I think the TVM could speed up the  
TokenSources part that creates the TokenStream based on the TermVector.

At any rate, I am going to think some more on it.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by Mark Miller <ma...@gmail.com>.

While the overall framework of LUCENE-663 appears similar to the current 
contrib Highlighter, the code is actually quite different and I do not 
think it handles as many corner cases in its current state. LUCENE-663 
supports PhraseQuerys by implementing 'special' search logic that 
inspects positional information to make sure the Tokens from a 
PhraseQuery are in order. I am not sure how exact this logic is compared 
to Lucenes PhraseQuery search logic, but a cursory look makes me think 
its not complete. It almost looks to me that it only does inorder with 
simple slop (not edit distance)...I am too lazy to check further though 
and I may have missed something. Also, LUCENE-663 does not support Span 
queries.

This patch differs in that it fits the current Highlighter framework 
without modifying it, and it uses Lucene's own internal search logic to 
identify Spans for highlighting. PhraseQueries are handled by a 
SpanQuery approximation.

As far as PhraseQuery/SpanQuery highlighting, I don't think any of the 
other Highlighter packages offer much. I think that things could be done 
a little faster, but that would require abandoning the current 
framework, and with all of the corner cases it now supports, I'd hate to 
see that.

The other Highlighter code that is worth consideration is LUCENE-644. It 
does abandon the current Highlighter framework and goes with an attack 
that is much more efficient for larger documents: instead of attacking 
the problem by spinning through all of the document tokens and comparing 
to query tokens, 644 just looks at the tokens from the query and grabs 
the original text using the offsets from those tokens. This is darn 
fast, but doesnt go well with positional highlighting and I wonder how 
well it supports all of the corner cases that arise with overlapping 
tokens and whatnot.

- Mark

Grant Ingersoll (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558784#action_12558784 ] 
>
> Grant Ingersoll commented on LUCENE-794:
> ----------------------------------------
>
> How should this relate to LUCENE-663?  Seems like that one also covers other kinds of queries?  I'm no expert in highlighting, but it seems like there is at least 3 different issues in JIRA for enabling things like phrase queries, etc.   Should we try to consolidate these?
>
>   
>> Extend contrib Highlighter to properly support phrase queries and span queries
>> ------------------------------------------------------------------------------
>>
>>                 Key: LUCENE-794
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Other
>>            Reporter: Mark Miller
>>            Priority: Minor
>>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>>
>>
>> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
>> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
>> There is a dependency on MemoryIndex.
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558784#action_12558784 ] 

Grant Ingersoll commented on LUCENE-794:
----------------------------------------

How should this relate to LUCENE-663?  Seems like that one also covers other kinds of queries?  I'm no expert in highlighting, but it seems like there is at least 3 different issues in JIRA for enabling things like phrase queries, etc.   Should we try to consolidate these?

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570478#action_12570478 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Will do. I'm taking a quick look now but should have more time tomorrow.

Thanks,
Mark

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: SpanHighlighter-01-28-2008.patch

Thanks, looks great.

New patch with posted code.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607437#action_12607437 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

FYI: This has been applied to the trunk as well.

The SpanScorer needs a TokenStream because it shoves the stream into a
MemoryIndex and transforms the query into a Span approximation to find hit
positions. This approach was used for compatibility with the current API.

The QueryScorer simply +scores each Token that is in the query as well as
the TokenStream - so it just needs to extract the terms from the query and
find overlap with the TokenStream passed to the getFragments method. This is
not position sensitive.

The SpanScorer works the same way, but it also fills the MemoryIndex and
gets matching Spans so that Terms in the wrong position score a 0 during
Highlighter Term scoring (again getBestFragments or whatever).

The approach was mainly dictated by the old API. Fitting into the current
API seemed the most practical/efficient way to get a position sensitive
Highlighter in the short term.

On Mon, Jun 23, 2008 at 7:51 PM, Tavi Nathanson (JIRA) <ji...@apache.org>



> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Environment:     (was: There are prob a few Java 1.5 requirements (generics) that could easily be removed.)

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, HighlighterTest.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter3.patch

This patch tries another approach instead of changing the existing Highlighter api. The result is that if you call getBestFragments more than once, you must call reset() on the SpanScorer between each call. Whether this is better than modifying the existing api, I am not sure.

This patch also adds a new SimpleSpanFragmenter that fragments based on size, but ensures that Spans are not broken up. This class might not be perfect yet.

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: HighlighterTest.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, HighlighterTest.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: WeightedSpanTerm.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545234 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Anything anyone wants to see to further this issue? It seems like a no brainer to add to the current contrib Highlighter...at this point, more than a few people are using it. Suggestions, criticisms, interest ?

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Sean O'Connor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496434 ] 

Sean O'Connor commented on LUCENE-794:
--------------------------------------

Mark,
   Can you point me in the right direction? I want to find ALL hits (not just the top xx), and their location in the text. 

    I think the functionality exists in your patch, or could be easily extended. I just can't seem to get my head around where to start. 
Thanks,

Sean


> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: HighlighterTest.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: SpanHighlighter-02-10-2008.patch

Another attempt at putting this to bed.

Added the MultiPhraseQuery support patch above - thanks!
Updated some code to stop using deprecated methods.
Made highlighting ConstantScoreRangeQuerys optional, defaulting to false.

- Mark

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561566#action_12561566 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Hows that work coming Michael? I have started turning the two test classes into one and I'd like to get together one final patch with your new work when I am done.

I have checked out your code that adds ConstantScoreRangQuery support and it looks great. Great idea there.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480868 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Bah, that last comment is rubbish again. Of course that will work alright. Everything is looking sharp.

On another note though, what do you think about the restriction of having to reset the SpanScorer between calls to getBestFragments? Is this preferable to an api change?

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526822 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

I believe the issue is that turning a PhraseQuery into a representative Span query is only an approximate conversion.

I will look into whether or not I can improve this.

Thanks for the feedback.

- Mark

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: Formatter.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: Encoder.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593049#action_12593049 ] 

Brian Whitman commented on LUCENE-794:
--------------------------------------

hi, after checking out lucene trunk and applying the 02-10-2008 patch I am getting this during "ant dist":

 [javac] /Users/bwhitman/outside/lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java:35: package org.apache.lucene.index.memory does not exist

Any ideas? The patch does say "relies on MemoryIndex" but that was committed a long time ago and is in contrib/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java in lucene trunk.




> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: CachedTokenStream.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: Highlighter.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Michael Goddard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Goddard updated LUCENE-794:
-----------------------------------

    Attachment: spanhighlighter_24_January_2008.patch

Relocated the fir.close() to after the extract(bq, terms) call.  Problem had manifested itself as a org.apache.lucene.store.AlreadyClosedException, but should be fixed via this patch.



> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: QuerySpansExtractor.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Michael Goddard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562103#action_12562103 ] 

Michael Goddard commented on LUCENE-794:
----------------------------------------

Mark,

Thanks for looking at that.  I just entered a new Jira issue for the new work,

  https://issues.apache.org/jira/browse/LUCENE-1148

which seems to work well enough with the most recent spanhighlighter_patch code, since  it already contains a clause to handle SpanQuery.  So, no need to wait on anything from me.

  Mike


> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479886 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Hey Mark,

I wasn't too happy about TermModifier either since I am basically violating encapsulation...TermModifier basically makes field public. I really don't see how it is possible to ignore fields in another way though. If you can think of a way, that would be awesome . At a minimum, the Term fields could be set back to their original value after doing the Span search...I wouldn't think that would be much of a performance hit.

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Michael Goddard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570917#action_12570917 ] 

Michael Goddard commented on LUCENE-794:
----------------------------------------

I'm fairly certain that Mark H.'s comments are correct, but somehow I was getting highlighting.  ConstantScoreRangeQuery was the query I'd used initially, but I had to later introduce a SpanRangeQuery which I could embed in SpanNearQuery instances.  And, yes, we have users who need this perverse combination.  They have a query syntax which is very expressive and enables them to nest "proximity" (SpanNearQuery) queries to an arbitrary depth; they can even embed numeric range queries within any of these sub-queries.  The requirement is mainly cultural, arising out of the long time use of a pure boolean text engine.  Still, over the past fifteen or so years, the user base has developed a fairly large body of "literature" -- queries they use to find certain things -- and they don't want to throw all of that away.  I agree that this type of thing is sort of specialized, but I thought there might just be a few others out there with similar needs.  The need to highlight all of this is due to the fact that several tools are use to post-process search results and visualize them.

I really appreciate the attention you guys have given to this.  There's the background from my end.

Thanks.


> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Summary: SpanScorer and SimpleSpanFragmenter for Contrib Highlighter  (was: Beginnings of a span based highlighter)

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558803#action_12558803 ] 

Grant Ingersoll commented on LUCENE-794:
----------------------------------------

Never mind, I went back and read the thread at http://lucene.markmail.org/message/p4gfxewk6jcqfxxj?q=highlighter+list:org%2Eapache%2Elucene%2Ejava-user
which I think accounts for this approach and makes sense to me.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: Highlighter.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473969 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

I had some free time today and came back this issue. I was so set on my own needs to start on this that I completely ignored looking closely at the contrib highlighter code. I went back and read over it this morning and am in the middle of a new solution. The new solution is in the form of new SpanQueryScorer that extends Scorer and plugs into the original contrib highlighter code. I have adapted almost all of the original tests (still a few to go) and so far they all still pass using the SpanQueryScorer. There is no guarantee yet that Spans will not be chopped up, but I am sure there is a way to share Span info with a Fragmenter if you wanted to rectify this (I may get to it). I also have not implemented a scoring properly yet...at the moment any term that is found returns a score of 1, and each unique term in a fragment contributes 1 to the fragment score. I will look at going further here, but I will be posting the code first after I convert the rest of the relevant tests and add a few Span Query tests.

I am pretty confident this will be a great solution for 'actual hit' highlighting with the already tried and true contrib Highlighter, fragments and all.

-Mark

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter_patch_4.zip

I have finally come up with a way to ignore fields and so the final test (testFieldSpecificHighlighting) passes for this. Now all original Highlighter tests pass with this patch. Pass null as the field to SpanScorer and fields will be ignored during highlighting.

SpanScorer now has the same behavior as the QueryScorer except that actual hits are highlighted.

I have also made a small fix to the SimpleSpanFragmenter.

I am still not sure if it is better to change the Highlighter API or require the kind of nasty call to reset the SpanScorer between calls to getBestFragments.

I have used a zip file this time. It contains the patch plus an index folder that holds a new class called TermModifier. This was necessary because I cannot add folders to the patch, but TermModifier needs to be in the org.apache.lucene.index package. First apply then patch, then add the index folder to the correct place in the Highlighter contrib section.

Not a lot left to do here. What do you think Mark H? 

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596632#action_12596632 ] 

Otis Gospodnetic commented on LUCENE-794:
-----------------------------------------

Regarding issues with building this, I am seeing the same thing, both for contrib/highlighter and contrib/xml-query-parser.  Running "ant compile-core" from within those dirs does not work and running "ant build-contrib" also fails.

The problem in both is with the dependency (on contrib/memory and contrib/queries.  Here is what fixes xml-query-parser:

Instead of:
..... inheritall="false/>

Use this:
......inheritall="true" dir="../queries" />

And a similar thing for contrib/highlighter.
I'll commit both fixes shortly.


> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: applying patches (was [jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery)

Posted by Tricia Williams <wi...@gmail.com>.

Hi Maurizio,

    I'm replying in java-user because I believe this is the appropriate 
place for a question like this.

    All the patches that I have encountered (including this one) are 
usually applied at the root.  One should download the source code from 
http://svn.apache.org/repos/asf/lucene/java/trunk/.  From the trunk 
directory all you should need to apply is the most recent patch: 
SpanHighlighter-02-10-2008.patch.  The syntax for applying patches is 
typically:
patch -p 0 -i <path to patch> [--dry-run]
where the dry-run flag allows you to see if the patch applies cleanly 
without gumming up your source.

Hope that helps,
Tricia

Maurizio (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593030#action_12593030 ] 
>
> Maurizio commented on LUCENE-794:
> ---------------------------------
>
> Hi,
> probably I'm missing something, I'm not sure this is the right place to ask my question, but I can't understand how patch mechanism works.
> First, I downloaded source code from http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/, then I tried to apply every patches listed above.
> I wrote a trivial bash script for apply patches (I'm assuming that these one are not cumulative patch), but unsuccessfully.
>  
> thanks in advance...
>
> Maurizio
>
>
> patch.sh
> /*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
> #!/bin/sh
> patch -p2 < spanhighlighter.patch
> patch -p2 < spanhighlighter2.patch
> patch -p2 < spanhighlighter3.patch
> patch -p2 < spanhighlighter4.patch
> unzip spanhighlighter_patch_4.zip
> mv index src/java/org/apache/lucene/
> patch -p2 < spanhighlighter5.patch
> patch -p2 < spanhighlighter6.patch
> patch -p2 < spanhighlighter7.patch
> patch -p2 < spanhighlighter8.patch
> patch -p2 < spanhighlighter9.patch
> patch -p2 < spanhighlighter10.patch
> patch -p2 < spanhighlighter11.patch
> patch -p2 < spanhighlighter12.patch
> patch -p2 < spanhighlighter_24_January_2008.patch
> patch -p2 < SpanHighlighter-01-26-2008.patch
> patch -p2 < SpanHighlighter-01-28-2008.patch
> patch -p2 < MultiPhraseQueryExtraction.patch
> patch -p2 < SpanHighlighter-02-10-2008.patch
> patch -p2 < MultiPhraseQueryExtraction.patch
>
> output
> /*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 succeeded at 18 with fuzz 1 (offset 17 lines).
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/Highlighter.java
> Hunk #1 FAILED at 222.
> Hunk #2 succeeded at 257 (offset 2 lines).
> 1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Highlighter.java.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QueryScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/Scorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> Hunk #1 succeeded at 460 (offset 7 lines).
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/Highlighter.java
> Hunk #1 FAILED at 222.
> Hunk #2 FAILED at 255.
> 2 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Highlighter.java.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QueryScorer.java
> Reversed (or previously applied) patch detected! Assume -R? [n]
> Apply anyway? [n] y
> Hunk #1 FAILED at 104.
> 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/QueryScorer.java.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/Scorer.java
> Reversed (or previously applied) patch detected! Assume -R? [n]
> Apply anyway? [n] y
> Hunk #1 FAILED at 36.
> 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Scorer.java.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> Reversed (or previously applied) patch detected! Assume -R? [n]
> Apply anyway? [n] y
> Hunk #1 FAILED at 460.
> 1 out of 1 hunk FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> Archive: spanhighlighter_patch_4.zip
>    creating: index/
>   inflating: index/TermFieldModifier.java
> replace spanhighlighter4.patch? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
>   inflating: spanhighlighter4.patch
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> Hunk #3 FAILED at 68.
> 1 out of 3 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> Reversed (or previously applied) patch detected! Assume -R? [n]
> Apply anyway? [n] y
> Hunk #1 FAILED at 21.
> Hunk #2 FAILED at 56.
> Hunk #3 FAILED at 68.
> 3 out of 3 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
> can't find file to patch at input line 5
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:
> --------------------------
> |diff --git a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> |index d46f5c2..d456f59 100644
> |--- a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> |+++ b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> --------------------------
> File to patch:
> Skip this patch? [y] y
> Skipping patch.
> 2 out of 2 hunks ignored
> can't find file to patch at input line 76
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:
> --------------------------
> |diff --git a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> |index 59179d4..a0f9a7b 100644
> |--- a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> |+++ b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> --------------------------
> File to patch:
> Skip this patch? [y] y
> Skipping patch.
> 3 out of 3 hunks ignored
> (Stripping trailing CRs from patch.)
> patching file build.xml
> Hunk #1 FAILED at 1.
> 1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
> (Stripping trailing CRs from patch.)
> patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> (Stripping trailing CRs from patch.)
> patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> Hunk #1 FAILED at 21.
> Hunk #2 FAILED at 69.
> 2 out of 2 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
> can't find file to patch at input line 5
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:
> --------------------------
> |diff --git a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> |index d46f5c2..d456f59 100644
> |--- a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> |+++ b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> --------------------------
> File to patch:
> Skip this patch? [y] y
> Skipping patch.
> 2 out of 2 hunks ignored
> can't find file to patch at input line 76
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:
> --------------------------
> |diff --git a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> |index 59179d4..a0f9a7b 100644
> |--- a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> |+++ b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
> --------------------------
> File to patch:
> Skip this patch? [y] y
> Skipping patch.
> 3 out of 3 hunks ignored
> /*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
>
>
>
>   
>> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
>> -----------------------------------------------------------------------------------------------
>>
>>                 Key: LUCENE-794
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Other
>>            Reporter: Mark Miller
>>            Priority: Minor
>>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>>
>>
>> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
>> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
>> There is a dependency on MemoryIndex.
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Maurizio (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593030#action_12593030 ] 

Maurizio commented on LUCENE-794:
---------------------------------

Hi,
probably I'm missing something, I'm not sure this is the right place to ask my question, but I can't understand how patch mechanism works.
First, I downloaded source code from http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/, then I tried to apply every patches listed above.
I wrote a trivial bash script for apply patches (I'm assuming that these one are not cumulative patch), but unsuccessfully.
 
thanks in advance...

Maurizio


patch.sh
/*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
#!/bin/sh
patch -p2 < spanhighlighter.patch
patch -p2 < spanhighlighter2.patch
patch -p2 < spanhighlighter3.patch
patch -p2 < spanhighlighter4.patch
unzip spanhighlighter_patch_4.zip
mv index src/java/org/apache/lucene/
patch -p2 < spanhighlighter5.patch
patch -p2 < spanhighlighter6.patch
patch -p2 < spanhighlighter7.patch
patch -p2 < spanhighlighter8.patch
patch -p2 < spanhighlighter9.patch
patch -p2 < spanhighlighter10.patch
patch -p2 < spanhighlighter11.patch
patch -p2 < spanhighlighter12.patch
patch -p2 < spanhighlighter_24_January_2008.patch
patch -p2 < SpanHighlighter-01-26-2008.patch
patch -p2 < SpanHighlighter-01-28-2008.patch
patch -p2 < MultiPhraseQueryExtraction.patch
patch -p2 < SpanHighlighter-02-10-2008.patch
patch -p2 < MultiPhraseQueryExtraction.patch

output
/*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 succeeded at 18 with fuzz 1 (offset 17 lines).
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Highlighter.java
Hunk #1 FAILED at 222.
Hunk #2 succeeded at 257 (offset 2 lines).
1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Highlighter.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QueryScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Scorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Hunk #1 succeeded at 460 (offset 7 lines).
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Highlighter.java
Hunk #1 FAILED at 222.
Hunk #2 FAILED at 255.
2 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Highlighter.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QueryScorer.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 104.
1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/QueryScorer.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Scorer.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 36.
1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Scorer.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 460.
1 out of 1 hunk FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
Archive: spanhighlighter_patch_4.zip
   creating: index/
  inflating: index/TermFieldModifier.java
replace spanhighlighter4.patch? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: spanhighlighter4.patch
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Hunk #3 FAILED at 68.
1 out of 3 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 21.
Hunk #2 FAILED at 56.
Hunk #3 FAILED at 68.
3 out of 3 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|index d46f5c2..d456f59 100644
|--- a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|+++ b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
2 out of 2 hunks ignored
can't find file to patch at input line 76
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|index 59179d4..a0f9a7b 100644
|--- a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|+++ b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
3 out of 3 hunks ignored
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Hunk #1 FAILED at 21.
Hunk #2 FAILED at 69.
2 out of 2 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|index d46f5c2..d456f59 100644
|--- a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|+++ b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
2 out of 2 hunks ignored
can't find file to patch at input line 76
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|index 59179d4..a0f9a7b 100644
|--- a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|+++ b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
3 out of 3 hunks ignored
/*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/



> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479883 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Hi Mark,
Got the code patched and running here.
Junit seems to work fine but I feel a little uncomfortable about use of the TermModifier class. Using this has the potentially undesirable side-effect of changing the client's query field. If they plan on re-running the same query this could be a problem.

I'll need to have a think if there is a better solution to this.

Cheers,
Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470074 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

There are two highlighting modes: highlight entire spans or highlight first and last word of each span. For the highlight first and last word of span it would probably be better to change QuerySpansExtractor.getSpansFromPhraseQuery so that it creates a series of near spans instead of a single near span with multiple clauses.

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, HighlighterTest.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: Highlighter.java
                CachedTokenStream.java
                SpanHighlighterTest.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: SpanHighlighterTest.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

      Description: 
This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.

See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

There is a dependency on MemoryIndex.

  was:
This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.

See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

There is a dependency on MemoryIndex.

    Lucene Fields: [New, Patch Available]  (was: [New])
          Summary: Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery  (was: Extend contrib Highlighter to properly support phrase queries and span queries)

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter6.patch

Updated the patch to version 6. Apply against Lucene trunk.

- Updated CachedTokenStream to implement reset() instead of rewind()
- Removed rewind checks in CachedTokenStream

- Reordered QuerySpansExtractor constructors and added one
- QuerySpansExtractor now interns field name for faster comparisons against Token fields



> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474582 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Almost at the holy grail here. Everything works except the optional ignoring fields in the Query object. Scores work, all other tests pass, and even better there is no more limitation of only highlighting the first and last term in a Span -- instead all correct terms in each Span will be highlighted. The only change to the existing code I had to make was to add a parameter to scoreToken(Token token) -- I had to add int position.

I still think it is very feasible to pass info from this SpanScorer to a Fragmenter so that the Freagmenter can attempt to avoid splitting up Spans.

The current code will correctly highlight pretty much any standard or span query ( I think <g>) based on 'actual' hits using the exisiting contrib highlighter code...I have yet to write out the new extensive Span tests and I would appreciate it if some others would go over the code for some obvious improvements, but this is almost ready.

Get the latests:
SpanScorer
SpanQueryExtractor
CachedTokenStream
SpanHighlighterTest 
WeightedSpanTerm
Highlighter

- Mark

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: QuerySpansExtractor.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Maurizio (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593030#action_12593030 ] 

maurizio316 edited comment on LUCENE-794 at 4/29/08 7:43 AM:
----------------------------------------------------------

Hi,
probably I'm missing something, I'm not sure this is the right place to ask my question, but I can't understand how patch mechanism works.
First, I downloaded source code from http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/, then I tried to apply every patches listed above.
I wrote a trivial bash script for apply patches (I'm assuming that these one are not cumulative patch), but unsuccessfully.
 
thanks in advance...

Maurizio


patch.sh
/*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
#!/bin/sh
patch -p2 < spanhighlighter.patch
patch -p2 < spanhighlighter2.patch
patch -p2 < spanhighlighter3.patch
patch -p2 < spanhighlighter4.patch
unzip spanhighlighter_patch_4.zip
mv index src/java/org/apache/lucene/
patch -p2 < spanhighlighter5.patch
patch -p2 < spanhighlighter6.patch
patch -p2 < spanhighlighter7.patch
patch -p2 < spanhighlighter8.patch
patch -p2 < spanhighlighter9.patch
patch -p2 < spanhighlighter10.patch
patch -p2 < spanhighlighter11.patch
patch -p2 < spanhighlighter12.patch
patch -p2 < spanhighlighter_24_January_2008.patch
patch -p2 < SpanHighlighter-01-26-2008.patch
patch -p2 < SpanHighlighter-01-28-2008.patch
patch -p2 < MultiPhraseQueryExtraction.patch
patch -p2 < SpanHighlighter-02-10-2008.patch
patch -p2 < MultiPhraseQueryExtraction.patch



      was (Author: maurizio316):
    Hi,
probably I'm missing something, I'm not sure this is the right place to ask my question, but I can't understand how patch mechanism works.
First, I downloaded source code from http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/, then I tried to apply every patches listed above.
I wrote a trivial bash script for apply patches (I'm assuming that these one are not cumulative patch), but unsuccessfully.
 
thanks in advance...

Maurizio


patch.sh
/*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
#!/bin/sh
patch -p2 < spanhighlighter.patch
patch -p2 < spanhighlighter2.patch
patch -p2 < spanhighlighter3.patch
patch -p2 < spanhighlighter4.patch
unzip spanhighlighter_patch_4.zip
mv index src/java/org/apache/lucene/
patch -p2 < spanhighlighter5.patch
patch -p2 < spanhighlighter6.patch
patch -p2 < spanhighlighter7.patch
patch -p2 < spanhighlighter8.patch
patch -p2 < spanhighlighter9.patch
patch -p2 < spanhighlighter10.patch
patch -p2 < spanhighlighter11.patch
patch -p2 < spanhighlighter12.patch
patch -p2 < spanhighlighter_24_January_2008.patch
patch -p2 < SpanHighlighter-01-26-2008.patch
patch -p2 < SpanHighlighter-01-28-2008.patch
patch -p2 < MultiPhraseQueryExtraction.patch
patch -p2 < SpanHighlighter-02-10-2008.patch
patch -p2 < MultiPhraseQueryExtraction.patch

output
/*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 succeeded at 18 with fuzz 1 (offset 17 lines).
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Highlighter.java
Hunk #1 FAILED at 222.
Hunk #2 succeeded at 257 (offset 2 lines).
1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Highlighter.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QueryScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Scorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Hunk #1 succeeded at 460 (offset 7 lines).
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Highlighter.java
Hunk #1 FAILED at 222.
Hunk #2 FAILED at 255.
2 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Highlighter.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QueryScorer.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 104.
1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/QueryScorer.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/Scorer.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 36.
1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/lucene/search/highlight/Scorer.java.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 460.
1 out of 1 hunk FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
Archive: spanhighlighter_patch_4.zip
   creating: index/
  inflating: index/TermFieldModifier.java
replace spanhighlighter4.patch? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: spanhighlighter4.patch
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/CachedTokenStream.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/QuerySpansExtractor.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
patching file src/test/org/apache/lucene/search/highlight/SpanHighlighterTest.java
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Hunk #3 FAILED at 68.
1 out of 3 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 21.
Hunk #2 FAILED at 56.
Hunk #3 FAILED at 68.
3 out of 3 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|index d46f5c2..d456f59 100644
|--- a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|+++ b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
2 out of 2 hunks ignored
can't find file to patch at input line 76
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|index 59179d4..a0f9a7b 100644
|--- a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|+++ b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
3 out of 3 hunks ignored
(Stripping trailing CRs from patch.)
patching file build.xml
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build.xml.rej
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/spanscorer.html
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/SpanScorer.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
(Stripping trailing CRs from patch.)
patching file src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
(Stripping trailing CRs from patch.)
patching file src/test/org/apache/lucene/search/highlight/HighlighterTest.java
Hunk #1 FAILED at 21.
Hunk #2 FAILED at 69.
2 out of 2 hunks FAILED -- saving rejects to file src/test/org/apache/lucene/search/highlight/HighlighterTest.java.rej
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|index d46f5c2..d456f59 100644
|--- a/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
|+++ b/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
2 out of 2 hunks ignored
can't find file to patch at input line 76
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|index 59179d4..a0f9a7b 100644
|--- a/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
|+++ b/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
3 out of 3 hunks ignored
/*-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/


  
> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475289 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Hi Mark,
I found a little time to look at the span Highlighter the other night and was struggling with some missing bits and pieces (updated Scorer, missing SynonymAnalyzer etc) so only got as far as getting it all to compile before I ran out of time. Hopefully the patch will make life easier - will investigate when I have another chance.

As for the build.xml - have a look at XMLQueryParser's build.xml in contrib. This has a dependency on the "queries" contrib module added to the build.xml.

Cheers
Mark H

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter10.patch

Requested changes have been made. Only relevant file now is spanhighlighter10.patch.

This is a parallel implementation...it uses all of the current Highlighter classes. Really, it is just a new Scorer implementation that scores position sensitive queries based on correct positions for a hit.

The whole approach was radically changed from the StringBuilder version, so all code is still Java 1.4 compatible.

I have been using this extensively with great success for a few months now.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: CachedTokenStream.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595830#action_12595830 ] 

Otis Gospodnetic commented on LUCENE-794:
-----------------------------------------

Probably no need for a new issue, just commit the fix.
But I also noticed that CHANGES.txt has no mention of LUCENE-794.  Somebody forgot to mention the fix there?  It's not too late :)



> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474590 ] 

Karl Wettin commented on LUCENE-794:
------------------------------------

Mark, I'll take a look at this any year now. I think the code can be used or tweaked to act as "term order suggestion" and "untokenized cosmetic suggestion from stored values" in my didyoumean-patch.

Is there some documentation that describes this patch in a  chronologically ordered text rather than "just" the java docs? Some simple package level html would probably help me to get started.

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487860 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Sorry Sean, I forgot to mention that the patch is off of the latest 
Lucene trunk code.

The range query test should fail because they switched the query parser 
to return a constant score query instead of a range query. Cannot 
highlight a constant score query.

- Mark



> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: Highlighter.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526834 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

I made up a quick test to identify the behavior but did not duplicate your results:

The results of your example:

doc in index: x y z a b y z

Searching for: "x y z"
	
Result: <b>x</b> <b>y</b> <b>z</b> a b y z

Could you post some code demonstrating the problem?



> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570472#action_12570472 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Hey Mark H, any chance you will have some time to look at this soon?

Now that the test classes have been merged, any change to the current contrib test class will break this patch.

I think everything is good except that we might want to alert which Highlighter version caused the Junit test to fail., since almost every test is run with both the standard and new Span Highlighter. I may just be nitpicking there though.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480207 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

"Another approach may be to modify the MemoryIndex content to suit the query. Your code creates a MemoryIndex when presented with the text of a field. If it recognised it was being used in "field-insensitive mode" it could extract the query terms and create a MemoryIndex field for each unique fieldname in the set of query terms"

This should work fine. I had dismissed it ( and again butted heads with it for a while now that you mentioned it) because I couldn't see the forest through the trees. I kept thinking, this is just not going to work with a Span query that has terms from different fields. Over and over I thought that. How can I ignore fields in a SpanQuery. Now it hits me, rather embarrassingly, such a SpanQuery doesn't make sense at all.

I will try your approach and submit a new patch.

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470313 ] 

Otis Gospodnetic commented on LUCENE-794:
-----------------------------------------

There is indeed some Java 1.5 code in contrib/  I believe the gdata-server uses 1.5 classes.  I think that's okay for contrib.

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by Mark Miller <ma...@gmail.com>.

I'll try and push organizing the unit tests up in my todo list.

Also though, it would be nice to make sure Michael Goddard's patch gets 
in. He has something that looks like it will add support for 
ConstantScoreRangeQuery's. I am fiddling with that now.

- Mark

Otis Gospodnetic (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557538#action_12557538 ] 
>
> Otis Gospodnetic commented on LUCENE-794:
> -----------------------------------------
>
> I re-skimmed this JIRA issue just now.  Other than the final cleanup that Marks mention, any reason this is not yet in svn?
>
>
>   
>> Extend contrib Highlighter to properly support phrase queries and span queries
>> ------------------------------------------------------------------------------
>>
>>                 Key: LUCENE-794
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Other
>>            Reporter: Mark Miller
>>            Priority: Minor
>>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>>
>>
>> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
>> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
>> There is a dependency on MemoryIndex.
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557538#action_12557538 ] 

Otis Gospodnetic commented on LUCENE-794:
-----------------------------------------

I re-skimmed this JIRA issue just now.  Other than the final cleanup that Marks mention, any reason this is not yet in svn?


> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593077#action_12593077 ] 

Brian Whitman commented on LUCENE-794:
--------------------------------------

Ah, got it. You have to ant dist before applying the patch to build the memory jar, then apply the patch and ant again.


> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593145#action_12593145 ] 

Brian Whitman commented on LUCENE-794:
--------------------------------------

I must be missing something, but after your commit, I did a clean checkout of lucene trunk, did an ant dist and am getting the same MemoryIndex problem as I reported above. Before I could just apply the patch after doing the ant dist first but now that the patch is in trunk I can't get around that. How are you compiling lucene now from trunk?



> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: spanhighlighter9.patch

patch version 9 : Apply to root dir of trunk 

Various small improvements.

Be sure to use the recently updated CachingTokenFilter for optimal performance.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Nicolas Dessaigne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563106#action_12563106 ] 

Nicolas Dessaigne commented on LUCENE-794:
------------------------------------------

Mark,

I added a few lines of code to the WeightedSpanTermExtractor.extract method to handle DisjunctionMaxQuery instances. I didn't take the time to make a patch against your new version but the code is pretty simple:

{code}
...
} else if (query instanceof DisjunctionMaxQuery) {}}
	Map disjunctTerms = new HashMap();
	for (Iterator iterator = ((DisjunctionMaxQuery) query).iterator(); iterator.hasNext();) {
		extract((Query) iterator.next(), disjunctTerms);
	}
	terms.putAll(disjunctTerms);
} else {
...
{code}

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570979#action_12570979 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>This may be largely irrelevant, but Solr has a ConstantScorePrefixQuery which has similar issues

No, very relevant. Only yesterday I had a user with exactly the same highlighting problem

>>it seems we prob shouldn't even keep it as configurable. Just drop it then?

My nightmare scenario is systems where people are using ConstantScoreRangeQuery in their queries to do both latitude and longitude ranges over large areas - that's a lot of terms. I'd at least want the option of NOT loading them all into RAM at once when highlighting.

Maybe we could look at having different highlight "matchers". The existing approach of keeping a big bag of query terms becomes a "TermsMatcher" (simply looks up tokens in a HashSet of terms), You can imagine a new "PrefixMatcher" which would examine tokens using "startsWith" and a "RangeMatcher" examine tokens using just a start and end term. However, there's  a danger we could end up re-implementing a lot of query logic so maybe the relevant queries/filters could implement a "Matcher" interface to enable the same logic that is used when scanning TermEnum at query time to be used by the Highlighter when looking at TokenStreams i,e. something like this:
interface Matcher
{
   boolean matches(String value)
}
Needs some more thought yet but it could be an approach.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499650 ] 

Michael Busch commented on LUCENE-794:
--------------------------------------

Hi Mark,

I don't know the details of your patch. I just saw your class 
CachedTokenStream and was wondering if you're aware of 
the new class CachingTokenFilter in the analysis package?
Maybe you could use that?

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570505#action_12570505 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Couple of quick comments from a first look.

* I amended the new unit test, stripped out all new "Span" related tests and compared running it against 2.3 highlighter code and this patch. On this rough test it looks like the new implementation is slightly faster executing these tests (2.6 seconds vs 3.0). Good stuff.

* The general advice on using the Highlighter is to call it with re-written queries in order to highlight fuzzy queries etc. That being the case, the support for ConstantScoreRangeQuery in WeightedSpanTermExtractor is not likely to work because ConstantScoreRangeQuery rewrites to a ConstantScoreQuery and therefore can't be inspected for terms.

Hope to spend some more time looking at this tomorrow.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: CachedTokenStream.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Mar 14, 2007, at 2:17 PM, Mark Miller (JIRA) wrote:
> Just for thought, what about a SpanOr query with two sub Span  
> queries that target different fields? Too obscure to care about?

You later mentioned that this works.  Really?!  Are you sure?

   public SpanOrQuery(SpanQuery[] clauses) {

     // copy clauses array into an ArrayList
     this.clauses = new ArrayList(clauses.length);
     for (int i = 0; i < clauses.length; i++) {
       SpanQuery clause = clauses[i];
       if (i == 0) {                               // check field
         field = clause.getField();
       } else if (!clause.getField().equals(field)) {
         throw new IllegalArgumentException("Clauses must have same  
field.");
       }
       this.clauses.add(clause);
     }
   }





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480866 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Just for thought, what about a SpanOr query with two sub Span queries that target different fields? Too obscure to care about?

I will post the new patch later tonight.

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570515#action_12570515 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Good catch right off Mark. Appreciate you looking into this so quickly <g> You've got a darn quick eye for problems.

Hey Michael G. How are you dealing with the rewrite issue for the ConstantScoreRangeQuery? I assume you just are not using rewrite? Any comments on this?

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593143#action_12593143 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Committed as part of r652164.

Thanks for all your hard work and putting up with my limited availability/support, Mark. 
I owe you a beer..

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Feb 5, 2007, at 11:44 AM, Mark Harwood (JIRA) wrote:

> ("CachedTokenStream" perhaps?)

In KS, I used "TokenBatch", which is currently implemented as an  
array of Tokens which reallocates itself in big chunks (10, 100, 200,  
400, 800, etc).

(Implementing TokenStream leads to completely unacceptable  
performance in any language where method call overhead is anything  
other than almost-free.)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by Wolfgang Hoschek <wo...@mac.com>.

>
>>> I need to read the TokenStream at least twice
>>> I used the horribly hackey but quick-for-me method of adding a  
>>> method to MemoryIndex that accepts a List of Tokens. Any ideas?
>
> I'm not sure about modifying MemoryIndex. It should be easy enough  
> to create a subclass of TokenStream - ("CachedTokenStream"  
> perhaps?) which takes a real TokenStream in it's constructor and  
> delegates all "next" calls to it (and also records them in a List)  
> for the the first use. This can then be "rewound" and re-used to  
> run through the same set of tokens held in the list  from the first  
> run.
>

Yes, as Marks points out this can be done without API change via the  
existing MemoryIndex.addField(String fieldName, TokenStream stream)

The TokenStream could be constructed along similar lines as done in  
MemoryIndex.keywordTokenStream(Collection) or perhaps along similar  
lines as in  
org.apache.lucene.index.memory.AnalyzerUtil.getTokenCachingAnalyzer 
(Analyzer)

If needed, an IndexReader can be created from a MemoryIndex via  
MemoryIndex.createSearcher().getIndexReader(), again without API change.

Wolfgang.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470327 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>Sorry about all that Mark H
No need for any apologies - all help is gratefully received!
I don't mean to criticise your efforts or seem picky - I just wanted to record my findings somewhere useful if we were to consider working a solution up from this "test code" rather than tweaking the current highlighter - I'm still uncertain about the best approach. I also thought it might be useful to point the potential issues out to you if you were already reliant on using this code somewhere.

>>I need to read the TokenStream at least twice
>>I used the horribly hackey but quick-for-me method of adding a method to MemoryIndex that accepts a List of Tokens. Any ideas? 

I'm not sure about modifying MemoryIndex. It should be easy enough to create a subclass of TokenStream - ("CachedTokenStream" perhaps?) which takes a real TokenStream in it's constructor and delegates all "next" calls to it (and also records them in a List) for the the first use. This can then be "rewound" and re-used to run through the same set of tokens held in the list  from the first run.


>>if position increment equals 0 skip printing out the token...but I am not totally confident it is perfect yet. 

I think it's possible some of the more Byzantine analyzers may have a position increment >0 but overlap in terms of their byte offsets. I'd need to check the old Junit tests to be sure on this. Welcome to my hell!

Thanks again for your help.
Mark H

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570896#action_12570896 ] 

Mike Klaas commented on LUCENE-794:
-----------------------------------

This may be largely irrelevant, but Solr has a ConstantScorePrefixQuery which has similar issues (but _should_ be highlighted most of the time).

It might find its way into lucene core one day.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480230 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Yup, we are on the same page. I was just buried in the code at the time, and having stared at your code that ignores the field for each Term I was not  thinking from a high level but was instead stuck on the process of ignoring fields in a similar manner. For whatever reason it never dawned on me that we don't have to worry about a Span that has Terms with different field values. After staring at your suggestion long enough, my brain de-fogged.

I will submit an updated patch tomorrow.

- Mark

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470379 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

I switched to accepting an analyzer and a field name. I need the field name anyway for the MemoryIndex.

 I agree that modifying MemoryIndex was horrible and I have removed that dependency (just did it as a 'quickfix'). 

I used the CachedTokenStream anyway to avoid analyzing twice (once for MemoryIndex and again for Highlighter use. Thanks for the idea...shows how bright I am having missed it <g>).

I removed all of the 1.5 code.

The code is probably fairly usable right now then. I think synonyms work fine unless a case does exist like you suggested.

So I suppose we have 4 options:

1. I extend and polish the code (needs more test cases, most of mine where written using my query parser) and it is used independently for full document highlighting based on spans. (I would like to add google cache like coloring)

2. The code is either merged with the existing highlighter or gutted to create a single highlighter that can fragment based on spans or based on the original term based approach.

3. The code is ignored and someone else starts fresh adding span support to the existing highlighter.

4. The code languishes in purgatory and we await the unknown.

- Mark M



> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488039 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

I use that to make the Range Query test pass. The old style Range Query 
is highlightable.


> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: HighlighterTest.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Michael Goddard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558819#action_12558819 ] 

Michael Goddard commented on LUCENE-794:
----------------------------------------

Mark,

I've still got a little work to do on it, but would like to also include support for highlighting of RangeQuery within SpanNearQuery.  I have a new SpanQuery subclass which helps, and will post that to see if it merits inclusion within Lucene.  In conjunction with that, I'd have one last "else if" clause to add to the patch covered by this issue.  Basically, I'm trying to make a case for the work covered in this Jira issue being committed, since it's very useful to me.


> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: SimpleFormatter.java
                QuerySpansExtractor.java
                Highlighter.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5 requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Andy Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526803 ] 

Andy Liu commented on LUCENE-794:
---------------------------------

I gave this patch a whirl, and it looks great.

I do see one problem.  Say a document contains:

x y z a b y z

and the query is:

"x y z"

the highlighter will return (with terms in brackets denoting highlighted terms):

[x] [y] [z] a b [y] [z]

Since the last y and z are not part of the full phrase, they should not be highlighted.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: QuerySpansExtractor.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473970 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

>>I am pretty confident this will be a great solution 

Great stuff, Mark. Sorry I've been out of the loop on this recently and not participating as much as I'd like - just too tied up with other work. 
I look forward to seeing your work!

Cheers,
Mark H

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: HighlighterTest.java
                Highlighter.java
                CachedTokenStream.java

Removed 1.5 dependencies, fixed api 

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Resolved: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood resolved LUCENE-794.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 2.3.2

Committed as part of r652164.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570921#action_12570921 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Thanks Michael,

Seems we can drop it to me then.

I don't know that the performance would be that bad - you are only extracting the terms from an index with a single document, so there is not likely to be *that* many terms. 

but,

Initially I thought that highlighting something like the Date would be nice as it would visually indicate that piece's involvement in selecting the document...but as Mark points out, its really not that helpful at all.

Since Micahel doesn't even need it anymore, it seems we prob shouldn't even keep it as configurable. Just drop it then?

- Mark

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593166#action_12593166 ] 

Hoss Man commented on LUCENE-794:
---------------------------------

bq. You have to ant dist before applying the patch to build the memory jar, then apply the patch and ant again.

this seems like a bug/inconsistency in the new contrib/highlighter/build.xml

it added a "buildHighlighter" target which takes care of the dependency in building contrib/memory ... this isn't the way this is normally handled in contrib build.xml files because it's a non standard target name that people (and the contrib walking code) don't know about.

xml-query-parser has a better example of doing this same kind of dependency...

<target name="compile-core" depends="build-queries, common.compile-core" />

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: DefaultEncoder.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: Highlighter.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475002 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

I have a patch coming tonight. It fixes a few odd mistakes and has a little more documentation. I had wanted to subpackage it into spanscorer for now, but it appears I can't make a patch with a new folder so that is out. Should I merge my package.html documentation with the one currently in highlighter? Also, I am not sure how a contrib that depends on another contrib should work build file wise (SpanScorer depends on MemoryIndex). I just made up something that works for now.

This new patch will be off the trunk so now the RangeQuery test fails as it does with the original QueryScorer...you cannot highlight a constantrangequery to my knowledge.

You also cannot ignore the fields in the query as you can with QueryScorer so that test fails. The only way that I can see doing this is to have the option in your query parser of ignoring all fields and just using one field name during parsing. Send the field-normal Query to search, and then make a field-neutered query for highlighting. That is the approach I will be taking with my query parser. I sure wish there was something better though.

Ill post the patch when I get out of work.

- Mark

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Sean O'Connor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487981 ] 

Sean O'Connor commented on LUCENE-794:
--------------------------------------

Thanks Mark. I had the trunk from a few days ago (perhaps a week), so that was just me being lazy : -).

Is there anything I should be aware of the: parser.setUseOldRangeQuery(true); in doSearching(String queryString)? [about  line 890 in SpanHighlighterTest.java]

I've read the javadocs which explain it a bit, but I don't think a understand enough to infer why you use it in the SpanHighterTest.java. If I can (relatively) safely ignore that, I will.

Sean


Mark Miller (JIRA) wrote:
    [ [1]https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487860 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Sorry Sean, I forgot to mention that the patch is off of the latest 
Lucene trunk code.

The range query test should fail because they switched the query parser 
to return a constant score query instead of a range query. Cannot 
highlight a constant score query.

- Mark



  
SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
-----------------------------------------------------------

                Key: LUCENE-794
                URL: [2]https://issues.apache.org/jira/browse/LUCENE-794
            Project: Lucene - Java
         Issue Type: Improvement
         Components: Other
           Reporter: Mark Miller
           Priority: Minor
        Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java


This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
See [3]http://issues.apache.org/jira/browse/LUCENE-403 for some background.
There is a dependency on MemoryIndex.
    

  

----------------------------------------------------------------------------------------
[1] https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487860
[2] https://issues.apache.org/jira/browse/LUCENE-794
[3] http://issues.apache.org/jira/browse/LUCENE-403


> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593177#action_12593177 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Darn...waited just too long to comment on this. I had noticed two tiny things: there is a System.out if a query is not recognized by the Query to Span converter.  Also, the get weightedTerms call might want to accept a cachingtokenfilter rather than a tokenstream...I don't think this is a biggie though...just avoids double wrapping in a cachingtokenfilter if that is what is passed in. Someone mentioned that one on the list a few days ago. Neither issues are a big deal, but would be nice to get the System.out out of there...sorry I missed it in the patch.

Also,

Sorry about the bad build file Hoss :( I swear I copied it off another contrib (I didn't know how to do it frankly), so I can't explain why its so incorrect. Maybe I have accumulated too much of this beer I keep demanding and dont remember changing things for the worse...

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499651 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Thanks Michael -- I was not aware and will certainly make the change in the next patch I put up.

> SpanScorer and SimpleSpanFragmenter for Contrib Highlighter
> -----------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611526#action_12611526 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Hey Tavi,

Try passing null as the field.

- Mark




> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: SpanHighlighter-01-26-2008.patch

This patch gets rid of the separate SpanScorer test class and combines all test in HighlighterTest. Almost all of the tests are now run twice - once with the standard QueryScorer and once with the new SpanScorer.

Thanks to all for the bug fixes and contributions.

- Mark

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: SpanHighlighter-01-26-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment:     (was: SimpleFormatter.java)

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591031#action_12591031 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Just tried the latest of everything- patch applies cleanly, Junit test passes and I've just run my own additional side-by-side tests with my content to see the effects of new phrase support and without.

Looks good to me - unless there are any objections I'll go ahead and commit.

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Michael Goddard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545462 ] 

Michael Goddard commented on LUCENE-794:
----------------------------------------

Mark,

I did a little bit more with this since I needed support for highlighting queries containing ConstantScoreRangeQuery's.  Would you be interested in looking at those changes?


> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470098 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

Sorry about all that Mark H. This was literally just some test code that I quickly shoved into an api similar to your existing highlighter. If you decided that it should be something considered on it's own I would certainly have quite a bit further to go. Mostly I just put it up for your evaluation on extending the current highlighter with this highlight method.

>1) Fieldname "contents" shouldn't be hardcoded into the Highlighter - different analyzers can behave differently for different fields (see >PerFieldAnalyzerWrapper). Either pass a fieldname parameter or do as the existing highlighter does and take a TokenStream. The latter approach >has the advantage of being able to avoid re-analysis and make use of any stored TermVectors (see TokenSources.java)

I don't have a great solution for this right now. I need to read the TokenStream at least twice due to the MemoryIndex extracting the spans. Unfortunately, it seems I can copy the tokens to a list or pass them to the MemoryIndex -- I cannot do both. The MemoryIndex is also looking for a field name...so while I changed the api to take a TokenStream, I have not resolved also needing the field name. I am hoping you have some good comments. To get around reading the TokenStream twice I used the horribly hackey but quick-for-me method of adding a method to MemoryIndex that accepts a List of Tokens. Any ideas?

2) Analyzers which produce overlapping tokens (see Synonym analyzer in existing highlighter Junit test) are problematic in the existing code. I remember the "TokenGroup" class in the existing highlighter was an approach to help cater for these "overlap" scenarios.

I always attack this last <G>. Seems a simple fix: if position increment equals 0 skip printing out the token. It passes your test which I have added to my test code, but I am not totally confident it is perfect yet.

3) Without wishing to resurrect the whole 1.4 vs 1.5 debate I beleive Lucene still targets Java 1.4.

Just me being lazy. I swear I have seen Contrib stuff that says 1.5. I have gone through and stripped out all of the 1.4 except for StringBuilder for the moment.

>To rectify these points it's not clear to me if it would be quicker to use your code or adapt the existing highlighter code to use spans.
>Thoughts? 

Depends entirely on what you think. I am sure I can fix all of the issues you mention (with a little advice <G>), but I am pretty new to this type of thing and perhaps you just want to start from scratch in order to achieve span highlighting with the existing highlighter. It may just be that the way I am doing this is not very compatible with the way you currently fragment and score.

I have added an updated Highlighter.java and HighlighterTest.java. The MemoryIndex problem remains...so it either has to be fixed or the modified MemoryIndex must be used.

- Mark m

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

Posted by "Tavi Nathanson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607415#action_12607415 ] 

Tavi Nathanson commented on LUCENE-794:
---------------------------------------

Hi,

I'm new to Lucene and the highlighter, so I apologize if my question is obvious. In any case, I'm trying to allow phrase highlighting in my instance of Lucene, so I applied this patch to 2.3.2. I'm confused, though, about the structure of SpanScorer vs. QueryScorer. Why does SpanScorer require the stream of source text tokens (i.e. SpanScorer(Query query, String field, CachingTokenFilter cachingTokenFilter)) while QueryScorer does not (i.e. QueryScorer(Query query, String fieldName))?

Intuitively, if QueryScorer is scoring based on the number of unique query terms found in the document, wouldn't the stream of source text tokens be necessary for this calculation? I'm wondering a) why is this not necessary in QueryScorer? and b) what makes it necessary in SpanScorer? I'm having some trouble understanding the code, and was wondering if I could get any guidance :).

Thanks!

Tavi

> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

Posted by "Andy Liu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526847 ] 

Andy Liu commented on LUCENE-794:
---------------------------------

Hmm, I tried it again and now it's working correctly.  Maybe I had interpreted the output incorrectly.  Sorry for the false alarm.

> Extend contrib Highlighter to properly support phrase queries and span queries
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-794) Beginnings of a span based highlighter

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-794:
-------------------------------

    Description: 
This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.

See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

There is a dependency on MemoryIndex.

  was:
This is some test code to start the work of adding a span based highlighting approach to the existing highlighter in contrib. See http://issues.apache.org/jira/browse/LUCENE-403 for some background.

There is a dependency on MemoryIndex.


> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachedTokenStream.java, CachedTokenStream.java, CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter2.patch, spanhighlighter3.patch, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org