You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2009/09/03 16:44:57 UTC

[jira] Created: (LUCENE-1889) FastVectorHighlighter: support for additional queries

FastVectorHighlighter: support for additional queries
-----------------------------------------------------

                 Key: LUCENE-1889
                 URL: https://issues.apache.org/jira/browse/LUCENE-1889
             Project: Lucene - Java
          Issue Type: Wish
          Components: contrib/*
            Reporter: Robert Muir
            Priority: Minor


I am using fastvectorhighlighter for some strange languages and it is working well! 

One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc)
Here is one thing Michael M posted in the original ticket:

{quote}
I think a nice [eventual] model would be if we could simply re-run the
scorer on the single document (using InstantiatedIndex maybe, or
simply some sort of wrapper on the term vectors which are already a
mini-inverted-index for a single doc), but extend the scorer API to
tell us the exact term occurrences that participated in a match (which
I don't think is exposed today).
{quote}

Due to strange requirements I am using something similar to this (but specialized to our case).
I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted,
and flattening multiphrasequeries into boolean or'ed phrasequeries.
I do not think these things would be 'fast', but i had a few ideas that might help:

* looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right?
* maybe as a last resort, try Query.extractTerms() ?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1889) FastVectorHighlighter: support for additional queries

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751752#action_12751752 ] 

Jason Rutherglen commented on LUCENE-1889:
------------------------------------------

Robert, you've implemented extending scorer to return the exact term occurrences?

> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>
>                 Key: LUCENE-1889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1889
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: contrib/*
>            Reporter: Robert Muir
>            Priority: Minor
>
> I am using fastvectorhighlighter for some strange languages and it is working well! 
> One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but specialized to our case).
> I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1889) FastVectorHighlighter: support for additional queries

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751795#action_12751795 ] 

Michael McCandless commented on LUCENE-1889:
--------------------------------------------

I think we "just" need to merge Span*Query into their "nomal" counterparts, making sure there's no performance penalty when you don't use the spans.  Then we get the exact occurrence of every match "for free" :)

> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>
>                 Key: LUCENE-1889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1889
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: contrib/*
>            Reporter: Robert Muir
>            Priority: Minor
>
> I am using fastvectorhighlighter for some strange languages and it is working well! 
> One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but specialized to our case).
> I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1889) FastVectorHighlighter: support for additional queries

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751753#action_12751753 ] 

Robert Muir commented on LUCENE-1889:
-------------------------------------

Jason, no but the high-level idea in concept is similar: re-run the query on single doc "mini-index" to work a bit differently (specialized for highlighting)

if i had done it in a nice way I would have contributed something :)


> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>
>                 Key: LUCENE-1889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1889
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: contrib/*
>            Reporter: Robert Muir
>            Priority: Minor
>
> I am using fastvectorhighlighter for some strange languages and it is working well! 
> One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but specialized to our case).
> I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1889) FastVectorHighlighter: support for additional queries

Posted by "Digy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843261#action_12843261 ] 

Digy commented on LUCENE-1889:
------------------------------

bq. One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc)

I am using queryParser.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
before query.rewrite, and it works well.

DIGY

> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>
>                 Key: LUCENE-1889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1889
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: contrib/*
>            Reporter: Robert Muir
>            Priority: Minor
>
> I am using fastvectorhighlighter for some strange languages and it is working well! 
> One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but specialized to our case).
> I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org