You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2010/05/02 23:51:55 UTC

[jira] Created: (SOLR-1900) move Solr to flex APIs

move Solr to flex APIs
----------------------

                 Key: SOLR-1900
                 URL: https://issues.apache.org/jira/browse/SOLR-1900
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 4.0
            Reporter: Yonik Seeley
             Fix For: 4.0


Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863503#action_12863503 ] 

Michael McCandless commented on SOLR-1900:
------------------------------------------

Whoa that's great!

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868748#action_12868748 ] 

Michael McCandless commented on SOLR-1900:
------------------------------------------

bq. and adds support to FieldType for converting to/from BytesRef.

Ooh!  This is one of my nocommits in LUCENE-2380 -- that will help.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867233#action_12867233 ] 

Yonik Seeley commented on SOLR-1900:
------------------------------------

Just committed a fix (r943994) so that getDocSet skips deleted docs.
This didn't seem to cause any issues because the generated sets are always intersected with other sets (like a base doc set) that does exclude deleted docs.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1900:
-------------------------------

    Attachment: SOLR-1900-facet_enum.patch

Here's an updated patch that fixes the issue (actually works around it) by overriding and delegating in SolrIndexSearcher itself.

I'm not going to commit this quite yet... the comparator used in BytesRef is not the same as the index order for code points outside the BMP... so people using those characters would see strange paging issues when sorting facet results by index order.  It looks like Lucene should be switching it's index order to pure code point order (which is exactly the same as comparing encoded UTF8 bytes when treated as unsigned).

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1900:
-------------------------------

    Attachment: SOLR-1900_termsComponent.txt

This patch (SOLR-1900_termsComponent.txt) converts the terms component to use the flex API, and adds support to FieldType for converting to/from BytesRef.

When rewriting this code, I noticed an existing bug when sorting by count - the tiebreak will be by external string label and hence won't be in index order.  I'll fix this before commit.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863206#action_12863206 ] 

Yonik Seeley commented on SOLR-1900:
------------------------------------

Yep, it was FilterIndexReader... I randomly overrode a bunch of the methods and delegated to the inner reader, and everything started working.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863854#action_12863854 ] 

Yonik Seeley commented on SOLR-1900:
------------------------------------

The DocSet generation in SolrIndexReader was also upgraded to flex... (I committed it yesterday accidentally along with the fix that caused some tests to hang).  Anyway, a facet.method=enum w/o the minDf  and with a too-small filterCache (means it will all go through the filter cache, but generate misses, meaning the changed code in SolrIndexSearcher will be used for every term to generate a new filter) was 53% faster (throughput) after then change.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved SOLR-1900.
--------------------------------

    Fix Version/s: 4.0
                       (was: Next)
       Resolution: Fixed

closing.  LUCENE-2378 did the rest.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896894#action_12896894 ] 

Yonik Seeley commented on SOLR-1900:
------------------------------------

Now that flex term enumerators can seek, it looks like all of the related logic in FileFloatSource is redundant (keeping track if keys are sorted, trying next() a few times before seeking, etc).

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909160#action_12909160 ] 

Michael McCandless commented on SOLR-1900:
------------------------------------------

I think it makes sense to move append to BytesRef, though I wonder if it should it over-allocate (ArrayUtil.oversize) when it grows?  I realize for the current calls to append we don't need that (you just append bigTerm, once), but if someone uses this like a StringBuffer... though, this isn't really the intention of BytesRef, so maybe it's OK to not oversize.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_bigTerm.txt, SOLR-1900_FileFloatSource.patch, SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1900:
-------------------------------

    Attachment: SOLR-1900-facet_enum.patch

Attaching draft flex use for the facet.enum method.

This currently fails with:
{code}
SEVERE: Exception during facet counts:java.lang.IllegalStateException: external IndexReader requires skipDocs == MultiFields.getDeletedDocs()
	at org.apache.lucene.index.LegacyFieldsEnum$LegacyDocsEnum.reset(LegacyFieldsEnum.java:206)
	at org.apache.lucene.index.LegacyFieldsEnum$LegacyTermsEnum.docs(LegacyFieldsEnum.java:164)
	at org.apache.lucene.index.MultiTermsEnum.docs(MultiTermsEnum.java:276)
	at org.apache.solr.request.SimpleFacets.getFacetTermEnumCounts(SimpleFacets.java:566)
{code}

Not sure if that's an issue with all compound readers (top-level readers as opposed to segment readers) or if it's an issue with SolrIndexReader.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1900:
-------------------------------

    Attachment: SOLR-1900_bigTerm.txt

Attaching patch that moves bigTerm into ByteUtils, adds BytesRef.append(BytesRef), and uses those in the faceting code when a prefix is specified (instead of a String with \uffff chars).

If people think that the append() is more Solr specific (i.e. not likely to be used in lucene) I can move it to Solr's ByteUtils.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_bigTerm.txt, SOLR-1900_FileFloatSource.patch, SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863499#action_12863499 ] 

Yonik Seeley commented on SOLR-1900:
------------------------------------

I did a facet.method=enum test with a large number of unique terms and a large minDf (so FilterCache won't be used, just enumerate over terms).
This patch increased throughput by 50%!


> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1900:
-------------------------------

    Attachment: SOLR-1900_FileFloatSource.patch

Here's a patch that simplifies FileFloatSource.

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_FileFloatSource.patch, SOLR-1900_termsComponent.txt
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863201#action_12863201 ] 

Yonik Seeley commented on SOLR-1900:
------------------------------------

Could this perhaps be an issue with FilterIndexReader?  It doesn't look like it delegates everything that it should?  (like fields() for example)

> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (SOLR-1900) move Solr to flex APIs

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863200#action_12863200 ] 

Yonik Seeley edited comment on SOLR-1900 at 5/2/10 5:58 PM:
------------------------------------------------------------

Attaching draft flex use for the facet.enum method.

This currently fails with:
{code}
SEVERE: Exception during facet counts:java.lang.IllegalStateException: external IndexReader requires skipDocs == MultiFields.getDeletedDocs()
	at org.apache.lucene.index.LegacyFieldsEnum$LegacyDocsEnum.reset(LegacyFieldsEnum.java:206)
	at org.apache.lucene.index.LegacyFieldsEnum$LegacyTermsEnum.docs(LegacyFieldsEnum.java:164)
	at org.apache.lucene.index.MultiTermsEnum.docs(MultiTermsEnum.java:276)
	at org.apache.solr.request.SimpleFacets.getFacetTermEnumCounts(SimpleFacets.java:566)
{code}

Not sure if that's an issue with all compound readers (top-level readers as opposed to segment readers) or if it's an issue with SolrIndexReader.

edit: this patch also makes BytesRef comparable.

      was (Author: yseeley@gmail.com):
    Attaching draft flex use for the facet.enum method.

This currently fails with:
{code}
SEVERE: Exception during facet counts:java.lang.IllegalStateException: external IndexReader requires skipDocs == MultiFields.getDeletedDocs()
	at org.apache.lucene.index.LegacyFieldsEnum$LegacyDocsEnum.reset(LegacyFieldsEnum.java:206)
	at org.apache.lucene.index.LegacyFieldsEnum$LegacyTermsEnum.docs(LegacyFieldsEnum.java:164)
	at org.apache.lucene.index.MultiTermsEnum.docs(MultiTermsEnum.java:276)
	at org.apache.solr.request.SimpleFacets.getFacetTermEnumCounts(SimpleFacets.java:566)
{code}

Not sure if that's an issue with all compound readers (top-level readers as opposed to segment readers) or if it's an issue with SolrIndexReader.
  
> move Solr to flex APIs
> ----------------------
>
>                 Key: SOLR-1900
>                 URL: https://issues.apache.org/jira/browse/SOLR-1900
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: SOLR-1900-facet_enum.patch
>
>
> Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org