You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2013/08/22 01:46:52 UTC

[jira] [Commented] (LUCENE-5123) invert the codec postings API

    [ https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747034#comment-13747034 ] 

Robert Muir commented on LUCENE-5123:
-------------------------------------

This is exciting! 

One idea is to just keep the old API (at least for now)?
Then, we dont have to cutover tons of code at once and we just have a new low level api (and back compat by accident).

I think it would be good if we wrote or converted a 'demo' codec (simpletext is ok for example, or a new simple one) to the new api first, just to see if we are happy with it.

Like maybe its just fine that if you are implementing the new API you have to compute the stats in your codec yourself, maybe its simple, or maybe we just plan on keeping the higher level API and not deprecating it.

                
> invert the codec postings API
> -----------------------------
>
>                 Key: LUCENE-5123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5123
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5123.patch
>
>
> Currently FieldsConsumer/PostingsConsumer/etc is a "push" oriented api, e.g. FreqProxTermsWriter streams the postings at flush, and the default merge() takes the incoming codec api and filters out deleted docs and "pushes" via same api (but that can be overridden).
> It could be cleaner if we allowed for a "pull" model instead (like DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of itself and just passed this to the codec consumer.
> This would give the codec more flexibility to e.g. do multiple passes if it wanted to do things like encode high-frequency terms more efficiently with a bitset-like encoding or other things...
> A codec can try to do things like this to some extent today, but its very difficult (look at buffering in Pulsing). We made this change with DV and it made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org