You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (Created) (JIRA)" <ji...@apache.org> on 2012/02/20 15:42:31 UTC

[jira] [Created] (LUCENE-3807) Cleanup suggester API

Cleanup suggester API
---------------------

                 Key: LUCENE-3807
                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/other
    Affects Versions: 3.6, 4.0
            Reporter: Simon Willnauer
             Fix For: 4.0


Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

here is a new patch removing the file based store / load methods and adding a fileName method to solrs factories to maintain compatibility. I will commit this soon.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221012#comment-13221012 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

committed to trunk.. I think we are ready to backport. I will do that in the next days
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211890#comment-13211890 ] 

Dawid Weiss commented on LUCENE-3807:
-------------------------------------

+1.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212551#comment-13212551 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

I agree with removing IOException from getComparator.

I agree also with moving weight to a long... though currently the e.g. solr integrations take floats as input,
so this really needs to be listed as a backwards break since it will directly affect users.

I agree with removing the sorted marker interface: its not useful since you don't know the order.

However, I don't think we should add the charsref methods... I think Bytes/Ints/CharsRef should have parallel 
apis and someone can just call unicodeutil: in general these are reference classes not stringbuffers and 
we shouldn't encourage abuse via sugar apis. I already have an issue open for fixing, cleaning up,
and making those APIs consistent.

I don't think we should add a generics parameter V to Lookup, especially if LookupResult itself is still
wired to float. I do think suggesters should be able to return additional data but this needs more thought:
its necessary to actually get the additional data to them.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3807) Cleanup suggester API

Posted by "Dawid Weiss (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213477#comment-13213477 ] 

Dawid Weiss edited comment on LUCENE-3807 at 2/22/12 9:16 AM:
--------------------------------------------------------------

Sorry for being late, work. I like the patch. Comments:

{noformat}
+    public Comparator<BytesRef> getComparator() {
+      return null;
+    }
{noformat}

This shows up in a number of places. I have mixed feelings about certain types having a comparator and others not having it, but it's minor.

BufferingTermFreqIteratorWrapper is a nuisance (buffers in memory). It would be nicer to have a sort on disk if something doesn't support sorted iteration order.

I also wonder float -> long = 4 -> 8 bytes... would this count as an incompatible API change (because what used to work for a given amount of RAM won't work anymore -- BufferingTermFreqIteratorWrapper again)?

{noformat}
+      if (l1 < l2) {
+        aStop = l1;
+      } else {
+        aStop = l2;
+      }
{noformat}

if I remember correctly Math.min/max are intrinsics, so you can afford to be explicit ;)

Why not a specific covariant here?

{noformat}
-  public Float get(String key) {
+  public Object get(CharSequence key) {
{noformat}

This doesn't seem necessary (lookup accepts a CharSequence?).
{noformat}
@@ -199,7 +199,7 @@ public class LookupBenchmarkTest extends LuceneTestCase {
         public Integer call() throws Exception {
           int v = 0;
           for (String term : input) {
-            v += lookup.lookup(term, onlyMorePopular, num).size();
+            v += lookup.lookup(new CharsRef(term), onlyMorePopular, num).size();
{noformat}

I like the rest, including the CharSequenceish evilness of bytesToCharSequence :)

                
      was (Author: dweiss):
    Sorry for being late, work. I like the patch. Comments:

+    public Comparator<BytesRef> getComparator() {
+      return null;
+    }

This shows up in a number of places. I have mixed feelings about certain types having a comparator and others not having it, but it's minor.

BufferingTermFreqIteratorWrapper is a nuisance (buffers in memory). It would be nicer to have a sort on disk if something doesn't support sorted iteration order.

I also wonder float -> long = 4 -> 8 bytes... would this count as an incompatible API change (because what used to work for a given amount of RAM won't work anymore -- BufferingTermFreqIteratorWrapper again)?

+      if (l1 < l2) {
+        aStop = l1;
+      } else {
+        aStop = l2;
+      }

if I remember correctly Math.min/max are intrinsics, so you can afford to be explicit ;)

Why not a specific covariant here?

-  public Float get(String key) {
+  public Object get(CharSequence key) {

@@ -199,7 +199,7 @@ public class LookupBenchmarkTest extends LuceneTestCase {
         public Integer call() throws Exception {
           int v = 0;
           for (String term : input) {
-            v += lookup.lookup(term, onlyMorePopular, num).size();
+            v += lookup.lookup(new CharsRef(term), onlyMorePopular, num).size();

This doesn't seem necessary (lookup accepts a CharSequence?).

I like the rest, including the CharSequenceish evilness of bytesToCharSequence :)


                  
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

updated patch with davids suggestions
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

here is a patch that removes the file based store / load methods. Once thing I don't like about this is that we use one an the same file name in solr to store stuff. We could possibly fix this and make it bw compatible by pulling a file name from the factory for instance but even that is kind of flaky. Maybe somebody has a better idea.

Yet I think one of the biggest issues here is  that we don't really have a header on the actual implementation. Ie you could simply load a FST from a FST suggester into WFST but the results would be bogus. I think we should add real headers to the files to fail early and give good error messages. 
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212090#comment-13212090 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

I think we should go with a simpler solution for the Buffering/SortedIterator... this has caused a few recent test fails.

(latest: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12476/)

If we are going to replace it anyway with the Sort that spills to disk, cant we just use a simple treemap?

                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221239#comment-13221239 ] 

Dawid Weiss commented on LUCENE-3807:
-------------------------------------

If you say you already have an implementation that could use more than one file then it'd be silly to enforce a single file format. I agree too many options is not good, in particular it's painful for potential implementors.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221224#comment-13221224 ] 

Dawid Weiss commented on LUCENE-3807:
-------------------------------------

I added methods that used streams. They stored the underlying FST. I personally didn't like the "must point to a directory" approach. 

The File(dir) methods got inherited from the original Lookup interface I think. Those saving/reading from streams were not in the interface so they were specific to a concrete implementation. Don't know how it looks now.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807_3x.patch

here is a patch against 3.x 
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212719#comment-13212719 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

bq. I think before proceeding with refactoring, we really need to beef up tests.
I will help too while I think the patch is ready... we can go with iterations?
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

next patch...
 * cut over Lookup API to use CharSequence instead of string. this allows us to use CharsRef and saves object creation.
* removed Sorted marker interface. BytesRefIterator has a getComparator() that can be null to indicate that there is no specified sort order. That way we can compare the Comparator<BytesRef> instance to see if the sort order matches what the impl expects.
* added a IOException to Dictionary#getWordsIterator() removing tons of try/catch throw RTE
* renamed freq to weight
* weight (ie freq) now returns a long instead of a float discouraging floating point number as weights

next steps would be cleaning up the in-memory sorting stuff and use the sorter which goes to disk to do the actual sorting internally if needed

comments welcome
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

here is a first patch extracting a BytesRefIterator as a common interface directly shared with TermsEnum as a first step. The semantics are taken from TermsEnum directly so nothing changes for TermsEnum and its consumers. The simple next method makes the most of the TermFreqIters trivial. 

comments welcome...
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221207#comment-13221207 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

Somewhere along the line we got load(File), load(InputStream), store(File), store(InputStream)... can we improve this?

I think its confusing when writing a new suggester to have all these hooks.

load(File) takes a directory name, so a suggester is free to use multiple files (I am working on one that might use 2 files).
so then what is load(InputStream) ?!
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

new patch cleaning up the api a little more according to roberts comments. I had some leftover CharsRef uses in there and I removed the generics on Lookup. This patch also removes all the additional methods on CharsRef and adds some randomization for CharSequences..
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220926#comment-13220926 ] 

Dawid Weiss commented on LUCENE-3807:
-------------------------------------

I looked at your patch briefly, Simon. Random notes:

- I'd change currentElement into lastElement or something like that. Otherwise it looks odd to me in the code, as in:
{code}
     throw new IndexOutOfBoundsException("index " + pos
         + " must be less than the size: " + currentElement);
{code}

- typo in "orderdEntries".

- I'm very likely paranoid but I'd stick to just one class for storing these:
{code}
protected Number weightAsNumber(long weight) {
{code}
Since these are objects the memory gain will most likely be obscured by object alignments and object overhead itself and the downside is that you're using an interface with all call sites that will very likely become megamorphic (so no chances to inline anything). I don't know if it's worth the effort.

I didn't have time to think much about changes to the functional logic; I don't think there were any (and if there were, they should be covered by tests?).

                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

merged with trunk.

I added a changes.txt to contrib and fixed the issues david raised. 

I think this is ready. once this is in I will backport this to 3x and close the issue
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224693#comment-13224693 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

Thanks Simon! hard work but I expect there is more.

Hopefully this module will grow into something much bigger...
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

following up on this, here is another patch. 
I cleaned up the APIs even further removing get / add from Lookup which is not really used. TST and Jaspell still support add and get is still implemented on all others but its not part of the interface. The main thing that buggs me with add is that its inconsistent with build since it allows arbitrary output while build uses a Number. 
Once we need this we can add it back to the interface. 

TST and Jaspell are now optimized to not necessarily create new Float objects all the time but share instances if the incoming weight allows it. I also create Number instances based on the range which might safe some more memory while minor IMO.

LookupResult now also returns a long instead of a float to be consistent with the TermFreqIterator.

SortedTermFreqIteratorWrapper is now based on the On-Disk sort. I extended the sorter impl a little to work with Comparator<BytesRef> and fixed the TODO by using BytesRefList with ByteBlockPool internally. WFSTCompletionLookup now also uses the SortedTFIteratorWrapper to remove duplicated code.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221332#comment-13221332 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

the main reason why I moved this to the interface was that I needed to do some extra processing in the stream ie. calc a checksum and I wanted to write stuff directly to HDFS etc. I think we should go for streams or we pass in a lucene directory while I think unless really needed we should stick with a stream.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211926#comment-13211926 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

Again i just wanted to mention, i think its fine to commit as-is.

Consider my comments thinking-out-loud.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213519#comment-13213519 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

bq. This shows up in a number of places. I have mixed feelings about certain types having a comparator and others not having it, but it's minor.
well this an indicator if we know something about the order or not. if you get null there is not order specified...


bq. BufferingTermFreqIteratorWrapper is a nuisance (buffers in memory). It would be nicer to have a sort on disk if something doesn't support sorted iteration order.
this is the ultimate goal... see my comment above ({noformat} next steps would be cleaning up the in-memory sorting stuff and use the sorter which goes to disk to do the actual sorting internally if needed {noformat})


bq. I also wonder float -> long = 4 -> 8 bytes... would this count as an incompatible API change (because what used to work for a given amount of RAM won't work anymore – BufferingTermFreqIteratorWrapper again)?

see above

bq. if I remember correctly Math.min/max are intrinsics, so you can afford to be explicit 

I will upload a new patch - we use this in BytesRefComp too, I think its safe to fix

bq. This doesn't seem necessary (lookup accepts a CharSequence?).
right - leftover from an old iteration

bq. Why not a specific covariant here?
that would be Float get(String) or Float get(CharSequence) or... ;)


                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212636#comment-13212636 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

I found another bug (added test and committed fix in r1291826).
This would cause an exception in solr if someone called build but the field was empty.

I think before proceeding with refactoring, we really need to beef up tests.
I'll help with this.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221000#comment-13221000 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

i did a quick review: looks good to me
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3807:
------------------------------------

    Attachment: LUCENE-3807.patch

more cleanups... I made BytesRefSorter using BytesRefList (we should maybe rename to bytesrefarray) and in turn use BytesRefIterator instead of the stupid java iterator. So all those classes should be consistent now.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212130#comment-13212130 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

just drop it I will replace that stuff with on disk sort tomorrow anyway
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221249#comment-13221249 ] 

Dawid Weiss commented on LUCENE-3807:
-------------------------------------

+1 for streams. They're a lot more flexible than Files.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212036#comment-13212036 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

I committed the patch to trunk and moved the BytesRefList to the suggest package (pkg private) for now. I keep this issue open as I have more iterations / patches. I will backport before I close this issue ie. batch port to 3.x all commits related to this issue.  commit revision is 1291418
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221233#comment-13221233 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

Right I'm not concerned about concrete impls. Concrete can do whatever it wants :)

In the actual lookup interface, load/store from stream was added underneath this issue:

http://svn.apache.org/viewvc/lucene/dev/trunk/modules/suggest/src/java/org/apache/lucene/search/suggest/Lookup.java?r1=1236012&r2=1291418&diff_format=h

I'm not opposed to this change: but if we are going to require that implementations only use a single file,
then we should remove the File ones and just let caller deal with the inputstream... does that make sense?

Currently its impossible to tell which one will be called!
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212430#comment-13212430 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

bq. I think we should go with a simpler solution for the Buffering/SortedIterator... this has caused a few recent test fails.
those failures have been fixed..
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211921#comment-13211921 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

I like the patch, but only one thing (its fine to commit it as-is though, we can solve this on another issue, i just couldnt help but notice)

I don't think we should have the BufferedTermFreqIteratorWrapper/etc and the SortedTermFreqIterator marker interface needs to be fixed.

Here are the problems:
* Marker interface SortedTermFreqIterator doesn't tell you if its UTF-8 or UTF-16 order. Its implemented by two classes: SortedTermFreqIteratorWrapper,
which sorts in UTF-16 order, and HighFrequencyDictionary, which returns terms from the index (so UTF-8 order). The problem is that classes
that rely upon sorted order like JaSpell/TST are likely broken already. Fortunately FST/WFST always do their own sort.
* Buffering in RAM is not ideal. Instead I think all of these classes should be using our Sort anyway which can spill to disk.

For now could we put the BytesRefList in the suggest package since its only used there? we might not need it after we clean up
this sorting stuff in some future issue.

Also I don't think we should factor out the BytesRefIterator. I seriously think its a bad idea to tie our core index Terms enumeration API
with the spellcheck API at this time, it would make it hard to change in the future if we need, especially with spellcheck being... needing work :)

                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221243#comment-13221243 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

Well I switched to a single file already, so ignore that implementation :)

But my concern is mostly about API confusion for implementing suggesters: if we require a single file then lets just have InputStream.
Otherwise, lets remove InputStream from Lookup, (of course a specific concrete Impl that only uses one file is still free to provide that)
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213477#comment-13213477 ] 

Dawid Weiss commented on LUCENE-3807:
-------------------------------------

Sorry for being late, work. I like the patch. Comments:

+    public Comparator<BytesRef> getComparator() {
+      return null;
+    }

This shows up in a number of places. I have mixed feelings about certain types having a comparator and others not having it, but it's minor.

BufferingTermFreqIteratorWrapper is a nuisance (buffers in memory). It would be nicer to have a sort on disk if something doesn't support sorted iteration order.

I also wonder float -> long = 4 -> 8 bytes... would this count as an incompatible API change (because what used to work for a given amount of RAM won't work anymore -- BufferingTermFreqIteratorWrapper again)?

+      if (l1 < l2) {
+        aStop = l1;
+      } else {
+        aStop = l2;
+      }

if I remember correctly Math.min/max are intrinsics, so you can afford to be explicit ;)

Why not a specific covariant here?

-  public Float get(String key) {
+  public Object get(CharSequence key) {

@@ -199,7 +199,7 @@ public class LookupBenchmarkTest extends LuceneTestCase {
         public Integer call() throws Exception {
           int v = 0;
           for (String term : input) {
-            v += lookup.lookup(term, onlyMorePopular, num).size();
+            v += lookup.lookup(new CharsRef(term), onlyMorePopular, num).size();

This doesn't seem necessary (lookup accepts a CharSequence?).

I like the rest, including the CharSequenceish evilness of bytesToCharSequence :)


                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212740#comment-13212740 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

I like this latest patch!

I didnt test it, but i think its fine to move forward. 

I just mainly want to address the issues of whole suggesters that are not tested 
at all in this module (LUCENE-3813) before we did too much more.
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224242#comment-13224242 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

backported committed patches to 3.x in rev 1297946
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212720#comment-13212720 ] 

Simon Willnauer commented on LUCENE-3807:
-----------------------------------------

bq. so this really needs to be listed as a backwards break since it will directly affect users.
I will note this in the CHANGES once we are done
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3807) Cleanup suggester API

Posted by "Simon Willnauer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3807.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 3.6
         Assignee: Simon Willnauer

I committed the last patch to trunk and backported to 3x.

thanks guys
                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org