You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2009/06/14 13:04:07 UTC

[jira] Issue Comment Edited: (LUCENE-1673) Move TrieRange to core

    [ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719246#action_12719246 ] 

Uwe Schindler edited comment on LUCENE-1673 at 6/14/09 4:03 AM:
----------------------------------------------------------------

Here my own thoughts:

bq. But the latter can be refactored, to SortField.TRIE_XXX (not good name, as TRIE no longer used) and the parser instances can be added to FieldCache.

>From the API of FieldCache and sorting, in my opinion, it was always not a good idea, to link the encoding in index, to the impl everywhere.

- deprecate SortField.INT and use SortField.PLAIN_TEXT_INT instead and so on
- use SortField.PREFIX_ENCODED_INT for the trie ones (better name, this is the internal encoding name from TrieUtils)
- the default parsers (private) in FieldCache renaming to also PlainText* (but accessible)
- add TrieUtils.XxxParser to FieldCache (but accessible)
- re-use INT  (and so on) in Sort and cache code, where the data type is meant (we already have this in lots of code around), but where the encoding is meant use PLAIN_TEXT_ vs. PREFIX_ENCODED_ for the encoding. So we have in the Comparators we have the native type names, but in the impl (where the underlying encoding is used)

I know these are hard changes, but we had a lot of productivity in the past here (thanks Shai, Jason, Michael), so there are a lot of new APIs that are very much decoupled from the underlying encoding. This would again rename a lot of internal parts. But because of deprecation, this could be done in-line with Shai's and Michael's and Jason's changes here.

      was (Author: thetaphi):
    Here my own thoughts:

bq. But the latter can be refactored, to SortField.TRIE_XXX (not good name, as TRIE no longer used) and the parser instances can be added to FieldCache.

- deprecate SortField.INT and use SortField.PLAIN_TEXT_INT instead and so on
- use SortField.PREFIX_ENCODED_INT for the trie ones (better name, this is the internal encoding name from TrieUtils)
- the default parsers (private) in FieldCache renaming to also PlainText* (but accessible)
- add TrieUtils.XxxParser to FieldCache (but accessible)
  
> Move TrieRange to core
> ----------------------
>
>                 Key: LUCENE-1673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1673
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> TrieRange was iterated many times and seems stable now (LUCENE-1470, LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to its default FieldTypes (SOLR-940) and if possible I want to move it to core before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how should they be called in core? I would suggest to leave it as it is. On the other hand, if this keeps our only numeric query implementation, we could call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The problem here: ctors must be able to pass int, long, double, float as range parameters. For the end user, mixing these 4 types in one class is hard to handle. If somebody forgets to add a L to a long, it suddenly instantiates a int version of range query, hitting no results and so on. Same with other types. Maybe accept java.lang.Number as parameter (because nullable for half-open bounds) and one enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into o.a.l.analysis.tokenattributes? Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these TokenStreams and omits norms/tf per default for easier addition to Document instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org