You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 21:36:39 UTC

[jira] [Created] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Generify FST shortestPaths() to take a comparator
-------------------------------------------------

                 Key: LUCENE-3801
                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/FSTs
    Affects Versions: 3.6, 4.0
            Reporter: Robert Muir
         Attachments: LUCENE-3801.patch

Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
But maybe we can optimize something here, or maybe its just no big deal to us.

Because in general, this could be pretty powerful, e.g. if you needed to store 
some custom stuff in the suggester, you could use pairoutputs, or whatever.

And the possibility we might need shortestPaths for other cool things... at the
least I just wanted to have the patch up here.

I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
and other things and it works fine.

I tried to minimize the generics violations, there is only 1 (cannot create generic array).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3801:
--------------------------------

    Attachment: LUCENE-3801.patch

updated patch with mike's idea.

this wins some perf back:

||impl||prefixes 2-4||prefixes 6-9||prefixes 100-200||
|Jaspell|129,000|330,000|436,000|
|TST|31,000|258,000|820,000|
|FST|330,000|263,000|269,000|
|WFST|209,000|606,000|781,000|
|WFST-Generic|190,000|550,000|765,000|

I think this is ok... I'll commit soon.
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220874#comment-13220874 ] 

Robert Muir edited comment on LUCENE-3801 at 3/2/12 12:29 PM:
--------------------------------------------------------------

I think this might not be hard to fix, consider my comparator:
{code}
// compares just the weight side of the pair
static final Comparator<Pair<Long,Long>> minPairWeightComparator = new Comparator<Pair<Long,Long>> () {
  public int compare(Pair<Long,Long> left, Pair<Long,Long> right) {
    return left.output1.compareTo(right.output1);
  }  
};
{code}

Instead of looking for a NO_OUTPUT path, I think we should instead express "output == NO_OUTPUT" as "comparator.compare(previousOutput, outputs.add(previousOutput, output)) == 0",
Ill rerun benchmarks but i don't think this will hurt.
                
      was (Author: rcmuir):
    I think this might not be hard to fix, consider my comparator:
{code}
// compares just the weight side of the pair
static final Comparator<Pair<Long,Long>> minPairWeightComparator = new Comparator<Pair<Long,Long>> () {
  public int compare(Pair<Long,Long> left, Pair<Long,Long> right) {
    return left.output1.compareTo(right.output1);
  }  
};
{code}

Instead of looking for a NO_OUTPUT path, I think we should instead express "output == NO_OUTPUT" as "comparator.compare(previousOutput, output) == 0",
Ill rerun benchmarks but i don't think this will hurt.
                  
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220905#comment-13220905 ] 

Robert Muir commented on LUCENE-3801:
-------------------------------------

Hmm actually, i think we can speed this up a little bit if we re-arrange stuff in the addIfCompetitive?
Currently this patch causes us to 'add twice' (once for the A + B = A stuff, and then again in addIfCompetitive!)

                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219852#comment-13219852 ] 

Robert Muir commented on LUCENE-3801:
-------------------------------------

looking at the benchmarks again: I think there is an "off-by-a-factor-of-one-thousand" on the benchmark.

I could not for the life of me figure out why these suggesters would have e.g. 220QPS, but I think instead
that the benchmark is wrong. So 200,000-210,000QPS compared to 220,000QPS really cannot matter!

If the benchmark can really satisfy 50000 queries in 220 ms, thats no 226QPS!
 
I'll come back with a PairOutputs test in a bit to confirm the functionality but I'm not going to worry
about 5%/10% performance here, thats crazy.
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220929#comment-13220929 ] 

Robert Muir commented on LUCENE-3801:
-------------------------------------

{quote}
I think instead of per-arc add (to find the NO_OUTPUT arc) you can just do comparator.compare(arc.output, NO_OUTPUT) == 0.
{quote}

I have no idea what I was thinking... this is why we have code reviews i guess :)

Ill upload a new patch with new benchmarks.
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220919#comment-13220919 ] 

Christian Moen commented on LUCENE-3801:
----------------------------------------

Exactly.  If we match on normalized forms that also deal with the combined scripts users would use when they type in Japanese queries, i.e. ほっk to get 北海道 (Hokkaido) suggested, I think we have a good starting point to build a Japanese query suggester/autocompleter.
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3801:
--------------------------------

    Attachment: LUCENE-3801.patch

updated patch, instead of looking for 'output == 0' using the 'previousOutput + output == previousOutput' via the comparator.

random test passes now
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220924#comment-13220924 ] 

Michael McCandless commented on LUCENE-3801:
--------------------------------------------

I think instead of per-arc add (to find the NO_OUTPUT arc) you can just do comparator.compare(arc.output, NO_OUTPUT) == 0.

Otherwise patch looks great!  I think the perf loss if acceptable...

Very exciting!
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220902#comment-13220902 ] 

Robert Muir commented on LUCENE-3801:
-------------------------------------

I think this patch is ready to commit, but the tricky FST math there (compare + add) does add some additional cost.

Still, I think 180,000QPS versus 210,000QPS or whatever, who cares. Being able to separately have weights and outputs and
do shortest path operations on just the weight side (with any Outputs representation) is really powerful and I think we can
use it to improve our suggesters.

Benchmarks are all in QPS with a top-N of 7 suggestions (50,000 inputs)

||impl||prefixes 2-4||prefixes 6-9||prefixes 100-200||
|Jaspell|129,000|330,000|436,000|
|TST|31,000|258,000|820,000|
|FST|330,000|263,000|269,000|
|WFST|209,000|606,000|781,000|
|WFST-Generic|179,000|521,000|708,000|

I'll wait a bit in case someone wants to review or knows of ways to speed up the patch :)

                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned LUCENE-3801:
-----------------------------------

    Assignee: Robert Muir
    
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220874#comment-13220874 ] 

Robert Muir commented on LUCENE-3801:
-------------------------------------

I think this might not be hard to fix, consider my comparator:
{code}
// compares just the weight side of the pair
static final Comparator<Pair<Long,Long>> minPairWeightComparator = new Comparator<Pair<Long,Long>> () {
  public int compare(Pair<Long,Long> left, Pair<Long,Long> right) {
    return left.output1.compareTo(right.output1);
  }  
};
{code}

Instead of looking for a NO_OUTPUT path, I think we should instead express "output == NO_OUTPUT" as "comparator.compare(previousOutput, output) == 0",
Ill rerun benchmarks but i don't think this will hurt.
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3801:
--------------------------------

    Attachment: LUCENE-3801.patch

here's my patch, again for what its worth.
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Christian Moen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220909#comment-13220909 ] 

Christian Moen commented on LUCENE-3801:
----------------------------------------

If it's possible to speed things up, that's great.  Regardless, I think the flexibility vs. speed trade-off you are proposing here is a very good idea.  Powerful stuff, indeed.  +1.
                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220915#comment-13220915 ] 

Robert Muir commented on LUCENE-3801:
-------------------------------------

yeah as an example use case we could use Pair<weight,originalText> as an 'output' but 
store the kuromoji-generated readings as the input side for japanese text, we would
just analyze the japanese input to a reading at query-time too and I think it would work...

                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3801:
--------------------------------

    Attachment: LUCENE-3801.patch

updated patch with tests (which exposes a real problem!).

I created "paired" versions (Pair<Long,Long> where its weight,output) of the two existing shortestPath tests: the simple one, and the random one. The simple one passes (somehow), but the random one triggers an assertion in shortestPaths

Basically the issue is that the original impl has an optimization: when you pick some minimum path (say 5), you know there is sequence of all zeros leading to some final state, otherwise its not actually the minimum :)

The code takes advantage of this and (fortunately) asserts that it finds this path of all NO_OUTPUTs. 

The problem is if you have a PairOutputs(weight,output), its only NO_OUTPUT if *both* weight and output are also NO_OUTPUT, but this only holds true for the weight... the output has its own separately pushed minimums that don't necessarily correspond...


                
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3801) Generify FST shortestPaths() to take a comparator

Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-3801.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
                   3.6
    
> Generify FST shortestPaths() to take a comparator
> -------------------------------------------------
>
>                 Key: LUCENE-3801
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3801
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.6, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch, LUCENE-3801.patch
>
>
> Not sure we should do this, it costs 5-10% performance for WFSTSuggester.
> But maybe we can optimize something here, or maybe its just no big deal to us.
> Because in general, this could be pretty powerful, e.g. if you needed to store 
> some custom stuff in the suggester, you could use pairoutputs, or whatever.
> And the possibility we might need shortestPaths for other cool things... at the
> least I just wanted to have the patch up here.
> I haven't tested this on pairoutputs... but i've tested it with e.g. FloatOutputs
> and other things and it works fine.
> I tried to minimize the generics violations, there is only 1 (cannot create generic array).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org