You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2009/06/19 15:29:07 UTC

[jira] Created: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
----------------------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-1701
                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Index, Search
    Affects Versions: 2.9
            Reporter: Uwe Schindler
            Assignee: Uwe Schindler
             Fix For: 2.9


In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.

I and Yonik tend to use the factory for both, Mike tends to create the new classes.

Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).

Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722271#action_12722271 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq.  there is a possibility to also support Trie fields with standard SortField.XXX constants using autodetection.

Meaning we wouldn't be forced to specify FieldCache.DEFAULT_INT_PARSER/FieldCache.NUMERIC_UTILS_INT_PARSER when creating SortField or calling FieldCache.getInts?  And we'd make the core parsers package private again?

So users could simply do {{SortField("price", SortField.FLOAT)}} and it'd just work?

I think this is is compelling!  Why not take this approach?  There would then be no user visible changes to how you sort by numeric fields...

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722775#action_12722775 ] 

Earwin Burrfoot commented on LUCENE-1701:
-----------------------------------------

>>> Design for today.
>> And spend two years deprecating and supporting today's designs after you get a better thing tomorrow. Back-compat Lucene-style and agile design aren't something that marries well.
>> donating something to Lucene means casting it in concrete.
> We can't let fear of back-compat prevent us from making progress.
My point was that strict back-compat prevents people from donating work which is not yet finalized. They either lose comfortable volatility of private code, or have to maintain two versions of it - private and Lucene.

>> NRT seems to tread the same path, and I'm not sure it's going to win that much turnaround time after newly-introduced per-segment collection.
> I agree, per-segment collection was the bulk of the gains needed for
> NRT. This was a big change and a huge step forward in simple reopen
> turnaround.
I vote it for the most frustrating (in terms of adopting your custom code) and most useful change of 2.9 :)

> But, not having to write & read deletes to disk, not commit (fsync)
> from writer in order to see those changes in reader should also give
> us decent gains. fsync is surprisingly and intermittently costly.
I'm not sure this can't be achieved without messing with IR/W guts so much. Guys from LinkedIn that drive this feature (if i'm not mistaken), they had a prior solution with separate indexes, one on disk, one in RAM. Per-segment collection adds superfast reopens and MultiReader that is way greater than MultiSearcher - you can finally do adequate fast searches across separate indexes. Do we still need to add complexity for minor performance gains?

> And this integration lets us take it a step further with LUCENE-1313,
> where recently created segments can remain in RAM and be shared with
> the reader.
RAMDirectory?

>> Some time ago I finished a first version of IR plugins, and enjoy pretty low reopen times (field/facet/filter cache warmups included). (Yes, I'm going to open an issue for plugins once they stabilize enough)
> I'm confused: I thought that effort was to make SegmentReader's
> components fully pluggable? (Not to actually change what components
> SegmentReader is creating). EG does this modularization alter the
> approach to NRT? I thought they were orthogonal.
Yes, they are orthonogal. This was yet another praise to per-segment collection and an example of how this approach can be extended on your custom stuff (like filtercache).


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722862#action_12722862 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq. My point was that strict back-compat prevents people from donating work which is not yet finalized. They either lose comfortable volatility of private code, or have to maintain two versions of it - private and Lucene.

That's a good point, though if it's contrib and you're a contrib committer it lessens the challenge, but the challenge is still there....

bq. I vote it for the most frustrating (in terms of adopting your custom code) and most useful change of 2.9 

No pain no gain?

bq. Do we still need to add complexity for minor performance gains?

The problem is I've seen fsync take a ridiculous amount of time; it's not very predictable.  So I think we do need some way to not put fsync between the changes to the index and the ability to search those changes.

{quote}
> And this integration lets us take it a step further with LUCENE-1313,
> where recently created segments can remain in RAM and be shared with
> the reader.

RAMDirectory?
{quote}

Exactly; that's what LUCENE-1313 is doing (flush new segments to a RAMDir).

bq. This was yet another praise to per-segment collection and an example of how this approach can be extended on your custom stuff (like filtercache).

OK.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721827#action_12721827 ] 

Uwe Schindler edited comment on LUCENE-1701 at 6/19/09 8:50 AM:
----------------------------------------------------------------

But the same problem like with NumericTokenStream affects also NumericField, because of type safety it will only work with a setXxxValue (if not factory), e.g.
{code}
doc.add(new NumericField("price", precisionStep).setFloatValue(15.50f));
{code}

This code is not shorter than:
{code}
doc.add(NumericUtils.newFloatField("price", precisionStep, 15.50f));
{code}

Additionally with LUCENE-1699, we could also add Field.Store.XXX to the factory/ctor. 

OK, the factory solution has the problem, that you cannot reuse the field for effectiveness, so this is an argument *for* the extra class, that has setXxXValue().

For SortField: The factory code inside NumericUtils is only one Line, you only create a conventional SortField with a specific parser. If we do not want to have the factory in NumericUtils, I could also add an additional ctor option to the normal sortfield (which is still there: it takes the parser, LUCENE-1478). When all parsers are central in the FieldCache, one can create a SortField with one line of code (the current factory demonstrates this).

      was (Author: thetaphi):
    But the same problem like with NumericTokenStream affects also NumericField, because of type safety it will only work with a setXxxValue (if not factory), e.g.
{code}
doc.add(new NumericField("price", precisionStep).setFloatValue(15.50f));
{code}

This code is not shorter than:
{code}
doc.add(NumericUtils.newFloatField("price", precisionStep, 15.50f));
{code}

Additionally with LUCENE-1699, we could also add Field.Store.XXX to the factory/ctor.

For SortField: The factory code inside NumericUtils is only one Line, you only create a conventional SortField with a specific parser. If we do not want to have the factory in NumericUtils, I could also add an additional ctor option to the normal sortfield (which is still there: it takes the parser, LUCENE-1478). When all parsers are central in the FieldCache, one can create a SortField with one line of code (the current factory demonstrates this).
  
> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722769#action_12722769 ] 

Earwin Burrfoot edited comment on LUCENE-1701 at 6/22/09 12:18 PM:
-------------------------------------------------------------------

Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.

Though, if you want really fast dates, chosing hour/day/month/year as precision steps is vastly superior, plus it also clicks well with user-selected ranges. Still, I dumped this approach for uniformity and clarity.

      was (Author: earwin):
    Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.
  
> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: LUCENE-1701.patch

Here a new patch:
# deprecated the AUTO cache parts
# the test TestFieldCache tests the following algorithm, too
# changed autodetection for triefields to work like the auto cache: 
- in createValue, it is checked if parser==null, if this is so, the method calls the FieldCache.getInts() again, specifying each of the two possible parsers (for byte and short currently only one).
- After this, the cache will then contain the same array reference for both key variants ("finally used parser" and "null"). Somebody coming later and asking for a specific parser array will get the already cached one, the same for later consumers asking with null parser.
- To optimize the array allocation, it was delayed until the first value was successfully decoded. In case of an error, the array stays null and GC is happy :-)


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721979#action_12721979 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

I still think we should make NumericSortField strongly typed (not a
factory method that returns a SortField w/ the right parser).

I think it's far more consumable, from the user's standpoint. I made
a NumericField when indexing so making a NumericSortField to sort
makes it "obvious".

Ie this:
{code}
new NumericSortField("price");
{code}

is better than this:
{code}
new SortField("price", FieldCache.NUMERIC_FLOAT_PARSER);
{code}

or this:
{code}
SortField.getNumericSortField("price", SortField.FLOAT);
{code}

Uwe, I agree that if we take the developer's (us) standpoint, the
implementation of NumericSortField is so trivial (just pick the right
parser) that it's tempting to not name the class.  But from the user's
standpoint it's less consumable.

bq. Regardless of the fact that plain_int parser is on FieldCache, it still doesn't seem like we should add parsers to FieldCache for every field type.

{Extended,}FieldCache already is the central place that holds parsers
for all of Lucene's core types; why change that?  Leaving Numeric* out
is dangerous because then people will naturally assume getFloats() is
the method to call.


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: LUCENE-1701.patch

Here an updated patch:
- fix some copy/paste duplicates in assignments
- add extra check to fail early when autodetection fails

The patch (and the one before) checks the autodetection two times (so you see in the trie tests, how one would use SortField with trie), what do you think about all this?

bq. But, it is technically a (one bit) change to the index format, which people seriously objected to.  So net/net I'm OK going forward without it.

I agree, and for the reasons noted before, I would like to see NumericField as only a helper for indexing. Storing the number as plain-text string (as done in the patch) does not justify a NumericField on getting stored fields.

When Yonik committed LUCENE-1699, I would do some additional NumericField fine tuning, but the patch is finished now.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722820#action_12722820 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq. Using 4 for int, 6 for long. 

Unfortunately we can't easily conditionalize the default by int vs long.  Ie use you NumericField like this:

{code}
NumericField f = new NumericField("price", 4);
f.setFloatValue(15.50);
{code}

bq. Mike: As you see, the precision step is a good config approach, so an default is should be choosen carefully.

Agreed!  But, it need not be "perfect".  Advanced users can test & iterate to find the best tradeoff for their particular field's value distribution.  For slow ranges now with RangeQuery (because of many unique terms), NumericRangeQuery will be a massive speedup with eg a default of 4.

New users shouldn't have to understand what precisionStep means, or anything about "what's under the hood", in order to use NumericField.  I should be able to simply:
{code}
new NumericField("price", 15.50);
{code}

Erring towards more terms (and faster searches) is fine, I think, because in a "typical" index the text fields with dwarf any small added disk space (hence my proposal of 4 as the default precisionStep).

bq. Can you open an issue?

OK I'll open a new issue.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721956#action_12721956 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq. Here is a first draft of NumericField with the same handling as NumericTokenStream.

Looks good Uwe!  Is there an OK default for precisionStep so we don't have
to make that a required arg?  4?


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721946#action_12721946 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

bq. ut any new field cache API should be powerful enough to handle trie through the normal APIs that anyone else would have to go through when implementing their own field type

This is true, the LUCENE-831 patch contains a TrieValueSource for that (but it does not apply anymore, as contrib/search/trie is no longer available). Because of this, it can be implemented very cleanly.

For easy usage, I would simply suggest to have the parsers for the current plain text and trie field cache implementation public available as singletons, very simple and hurts nobody. I will post a patch for the whole case soon. NumericField will be tested in the NumericRangeQuery index creation (which is currently very ugly, like meikes comments about the code and reusing Fields/TokenStreams, with NumericField it looks like any other index code).

For Solr (SOLR-940), there is not need to use NumericField (which is just a helper), the code of TrieField can stay as it is, only some renamings and so on. It just presents a TokenStream to the underlying Solr indexer, as before.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721827#action_12721827 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

But the same problem like with NumericTokenStream affects also NumericField, because of type safety it will only work with a setXxxValue (if not factory), e.g.
{code}
doc.add(new NumericField("price", precisionStep).setFloatValue(15.50f));
{code}

This code is not shorter than:
{code}
doc.add(NumericUtils.newFloatField("price", precisionStep, 15.50f));
{code}

Additionally with LUCENE-1699, we could also add Field.Store.XXX to the factory/ctor.

For SortField: The factory code inside NumericUtils is only one Line, you only create a conventional SortField with a specific parser. If we do not want to have the factory in NumericUtils, I could also add an additional ctor option to the normal sortfield (which is still there: it takes the parser, LUCENE-1478). When all parsers are central in the FieldCache, one can create a SortField with one line of code (the current factory demonstrates this).

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721787#action_12721787 ] 

Earwin Burrfoot commented on LUCENE-1701:
-----------------------------------------

I vote for factories - escaping back-compat woes by exposing minimum interface.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721983#action_12721983 ] 

Yonik Seeley commented on LUCENE-1701:
--------------------------------------

bq. new NumericSortField("price");

Magic.
How is this supposed to work?
How will the sort field know the exact encoding?  How many bits are stored in a payload or in the position?
Could I make my own field type that had the same privileges?


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722551#action_12722551 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

The last patch looks great Uwe!  I think we're nearly done here:

  * I like how FieldComparator, and function/*FieldSource, are now
    simplified to longer have duplicated code for picking the default
    parser (and simply pass null instead).  Default selection is now
    single source.

  * Could you add the "<b>NOTE:</b> This API is experimental and might
    change in incompatible ways in the next release." caveat to the
    javadocs?

  * Can you change this:
{code}
if (parser == null) try {
  return getLongs(reader, field, DEFAULT_LONG_PARSER);
} catch (NumberFormatException ne) {
  return getLongs(reader, field, NUMERIC_UTILS_LONG_PARSER);      
}
{code}
to this:
{code}
if (parser == null) {
  try {
    return getLongs(reader, field, DEFAULT_LONG_PARSER);
  } catch (NumberFormatException ne) {
    return getLongs(reader, field, NUMERIC_UTILS_LONG_PARSER);      
  }
}
{code}
?
  * I think we can't actually deprecate NumberTools until we can call
    FieldCache.getShorts/getBytes on a NumericField?  Ie, people
    relying on short/byte (to consume much less memory in FieldCache)
    cannot switch to numeric, and so must continue to zero-pad if they
    need to use RangeQuery/Filter?

  * You need a CHANGES entry.


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: LUCENE-1701.patch

Here a patch with the auto-detection. All tests pass, also backwards-tests. The parsers are still public for total control on the conversion.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722166#action_12722166 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

{quote}
bq. Someday maybe I'll convince you to donate this "schema" layer on top of Lucene

It's not generic enough to be of use for every user of Lucene, and it doesn't aim to be such.
{quote}

Oh, OK.

bq. Solr has its own schema approach, and it has its merits and downfalls compared to mine. That's what is nice, we're able to use the same library in differing ways, and it doesn't force its sense of 'best practices' on us.

There's no forcing going on, here.  Even had we added the bit into the
index, there's still no "forcing".  We're not preventing advanced uses
of Lucene by providing strong Numeric* support in Lucene.  Simple
things should be simple; complex things should be possible...


{quote}
bq. But I hope there are SOME named classes in there and not all static factory methods returning anonymous untyped impls.

SOME of them aren't static :-D
{quote}

Heh.

{quote}
bq. We shouldn't weaken trie's integration to core just because others have private implementations.

You shouldn't integrate into core something that is not core functionality. Think microkernels.
It's strange seeing you drive CSFs, custom indexing chains, pluggability everywhere on one side, and trying to add some weird custom properties into index that are tightly interwoven with only one of possible numeric implementations on the other side.
{quote}

I agree: if Lucene had all extension points that'd make it possible
for a good integration of Numeric* without being in "core", we should
use that.  But we're just not there yet.  We want to get there, and we
will, but we can't hold up progress just because we think someday
we'll get there.  That's like saying we can't improve the terms dict
format because it's not pluggable yet.

{quote}
bq. Design for today.

And spend two years deprecating and supporting today's designs after you get a better thing tomorrow. Back-compat Lucene-style and agile design aren't something that marries well.
{quote}

bq. donating something to Lucene means casting it in concrete.

We can't let fear of back-compat prevent us from making progress.

bq. IndexReader/Writer pair is a good example of what I'm arguing against. A dusty closet of microfeatures that are tightly interwoven into a complex hard-to-maintain mess with zillions of (possibly broken) control paths - remember mutable deletes/norms+clone/reopen permutations? It could be avoided if IR/W were kept to the bare minimum (which most people are going to use), and more advanced features were built on top of it, not in the same place.

Sure, our approach today isn't perfect ("progress not perfection").
There are always improvements to be done.  If you see concrete steps
to simplify the current approach without losing functionality, please
post a patch.  I too would love to see such simplifications...

bq. NRT seems to tread the same path, and I'm not sure it's going to win that much turnaround time after newly-introduced per-segment collection.

I agree, per-segment collection was the bulk of the gains needed for
NRT.  This was a big change and a huge step forward in simple reopen
turnaround.

But, not having to write & read deletes to disk, not commit (fsync)
from writer in order to see those changes in reader should also give
us decent gains.  fsync is surprisingly and intermittently costly.

And this integration lets us take it a step further with LUCENE-1313,
where recently created segments can remain in RAM and be shared with
the reader.

If you have good simplifications/improvements on the approach here,
please post them.

bq. Some time ago I finished a first version of IR plugins, and enjoy pretty low reopen times (field/facet/filter cache warmups included). (Yes, I'm going to open an issue for plugins once they stabilize enough)

I'm confused: I thought that effort was to make SegmentReader's
components fully pluggable?  (Not to actually change what components
SegmentReader is creating).  EG does this modularization alter the
approach to NRT?  I thought they were orthogonal.

{quote}
bq. If we add some generic storable flags for Lucene fields, this is cool (probably), NumericField can then capitalize on it, as well as users writing their own NNNFields.
bq. +1 Wanna make a patch?

No, I'd like to continue IR cleanup and play with positionIncrement companion value that could enable true multiword synonyms. 
{quote}

Well I'm looking forward to seeing your approach on these two!


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722774#action_12722774 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

bq. Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.

I think 4 for ints is a good start, better as 4 for longs (which produces 16 different precision terms and upto 31 term enums [= precision changes] per range). 6 is a good idea, it brings a little bit more than 8 but does not produce too much precision changes. I tested that also with my 2 M numeric-only index here.

Mike: As you see, the precision step is a good config approach, so an default is shpould be choosen carefully.
It may even be different for different data types, when e.g. you have longs, but all longs in your index are only in a very limited range -- ok. You could use an int, too. But e.g. if you index dates as long and your dates are only between two years or something like that, 4 may still good. This is because on a smaller range, the algorith does not need to to up to the lowest precision.

bq. Though, if you want really fast dates, chosing hour/day/month/year as precision steps is vastly superior, plus it also clicks well with user-selected ranges. Still, I dumped this approach for uniformity and clarity.

That is clear. Because these precisions are fitting exact to users queries in case of dates (often users take full days when selecting the range).

Nice to hear, that you use TrieRange? What is your index spec and measured query speeds (if it does not go too far into company internals)?

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721830#action_12721830 ] 

Earwin Burrfoot commented on LUCENE-1701:
-----------------------------------------

Mike, I very much agree with everything you said, except "factory is less consumable than constructor" and "add stuff to index to handle NumericField".

Out of your three examples the second one is bad, no questions. But first and last are absolutely equal in terms of consumability.
Static factories are cool (they allow to switch implementations and instantiation logic without changing API) and are as easy to use (probably even easier with generics in Java5) as constructors.

If we add some generic storable flags for Lucene fields, this is cool (probably), NumericField can then capitalize on it, as well as users writing their own NNNFields.
Tying index format to some particular implementation of numerics is bad design. Why on earth can't my own split-field (vs single-field as in current Lucene) trie-encoded number enjoy the same benefits as NumericField from Lucene core?

bq. By this same logic, should we remove NumericRangeFilter/Query and use
static factories instead?
I do use factory methods for all my queries and filters, and it makes me feel warm and fuzzy! :) Under the hood some of them consult FieldInfo to instantiate custom-tailored query variants, so I just use range(CREATION_TIME, from, to) and don't think if this field is trie-encoded or raw.

"Simple things should be simple", okay. Complex things should be simple too, argh! :)

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722305#action_12722305 ] 

Uwe Schindler edited comment on LUCENE-1701 at 6/20/09 11:15 PM:
-----------------------------------------------------------------

 Yes, it would work this way. This only would violate Yoniks complaints about not miximg Trie too much into the other code, but this is already done because of this StopFillCacheException usage. When we do LUCENE-831, this should be thought about, too.

SortField.AUTO is deprecated and will not be changed (only detect text numbers). There should be a note, that it would not work with the "new" NumericFields.

I would make the core parsers public to enable users to have full control (on the other hand I could now hide also the trie parsers). But this is a bad approach, wherever automatisms are envolved, oneshould always have the possibility to fix to one parser. And why do we have the SortField/FieldCache accessors with parser parameter, when you cannot even use the default ones?

P.S.: About payloads & positions and the need for extra parameters to the parser: After adding support for positions or payloads to encode the highest precision, there is still no need for an extra SortField/Parser class or factory. The future "ValueSource" starts to decode the values until a change in shift occurs. This first shift is for sure the highest precision (because of term ordering), if it is 0, its like now (no payloads/prositions); if the first visible shift>0, payloads/positions were used and the numbero of bits there is also known.

      was (Author: thetaphi):
     Yes, it would work this way. This only would violate Yoniks complaints about not miximg Trie too much into the other code, but this is already done because of this StopFillCacheException usage. When we do LUCENE-831, this should be thought about, too.

I would make the core parsers public to enable users to have full control (on the other hand I could now hide also the trie parsers). But this is a bad approach, wherever automatisms are envolved, oneshould always have the possibility to fix to one parser. And why do we have the SortField/FieldCache accessors with parser parameter, when you cannot even use the default ones?

P.S.: About payloads & positions and the need for extra parameters to the parser: After adding support for positions or payloads to encode the highest precision, there is still no need for an extra SortField/Parser class or factory. The future "ValueSource" starts to decode the values until a change in shift occurs. This first shift is for sure the highest precision (because of term ordering), if it is 0, its like now (no payloads/prositions); if the first visible shift>0, payloads/positions were used and the numbero of bits there is also known.
  
> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722330#action_12722330 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

Uwe, with your patch, it looks like if I ask for eg doubles w/ parser=null, and then ask again w/ parser=FieldCache.DEFAULT_DOUBLE_PARSER, I get double entries stored?  Is there some way to take the auto-detected parser and use it (not null) in the cache key?

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722769#action_12722769 ] 

Earwin Burrfoot commented on LUCENE-1701:
-----------------------------------------

Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722306#action_12722306 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

{quote}
I'll quote myself, and then attempt to not repeat myself further after this point (the back and forth is silly).
bq. The next step after adding NumericField seems to be "it's a bug if getDocument() doesn't return a NumericField, so we must encode it in the index". If that's the case, I'm -1 on adding NumericField in the first place.
{quote}

It is mentioned in the docs, that this class is for indexing only:

{code}
* <p><b>Please note:</b> This class is only used during indexing. You can also create
* numeric stored fields with it, but when retrieving the stored field value
* from a {@link Document} instance after search, you will get a conventional
* {@link Fieldable} instance where the numeric values are returned as {@link String}s
* (according to <code>toString(value)</code> of the used data type).
{code}

In my opinion: Storing this info in the segments is not doable without pitfalls: If somebody indexes a normal field name in one IndexWriter session and starts to index using NumericFiled in the next session, he would have two segments with different encoding and two different "flags". When these two segments are merged later, what do with the flag?

If we want to have such Schemas, they must be index wide and we have no possibility in Lucene for that at the moment.

If somebody creates a schema, that can do this (by storing the schema in a separate file next to the segments file), we can think about it again (with all problems, like: MultiReader on top of two indexes with different schemas - forbid that because schema different?). All this says me, we should not do this, it is the task of Solr, my own project panFMP, or Earwin's own schema, to enforce it.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Closed: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler closed LUCENE-1701.
---------------------------------

    Resolution: Fixed

Commited backwards tests: revision 787714
Committed patch: revision 787723

Thanks Mike!

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722829#action_12722829 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

OK I opened spinoff issues LUCENE-1712 (default for precisionStep) and LUCENE-1713 (rename RangeQuery -> TextRangeQuery, or something).

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: LUCENE-1701.patch

Final patch.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722639#action_12722639 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------


bq. You cannot do this with NumberTools.

Whoa! Sorry, you are right.  Nothing in FieldCache can handle decoding
this encoding into short/byte; so you'd need something outside of
Lucene's core, today, to do that.

Though it does allow you to do RangeQuery, with short/byte, and you
could sort as SortField.STRING, (quite hideously memory inefficient).

So I now agree: we should deprecate NumberTools entirely, now.

If people ask how to handle short/byte beforew we resolve LUCENE-1710,
the answer is to upgrade everything to int.  The resulting wasteful
int[] that'd be in the FieldCache is not nearly as wasteful as the
String[] you'd need to use to do sorting, today.  And you could always
make a custom parser using NumericUtils that downcasts to byte/short,
anyway, since the needed APIs are public.


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722334#action_12722334 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

bq. Uwe, with your patch, it looks like if I ask for eg doubles w/ parser=null, and then ask again w/ parser=FieldCache.DEFAULT_DOUBLE_PARSER, I get double entries stored? Is there some way to take the auto-detected parser and use it (not null) in the cache key?

This is correct, the cache key is different for NULL vs. explicit parser, because the result may be different (which is unlikely the case) but they are two different things. When you ask for an auto-cache (we should deprecate the AUTO-part in field cache, too! It is not used anymore with new sorting code, even when SortField.AUTO is enabled), you also get different cache keys!

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721988#action_12721988 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

I aggree with Yonik, this is too much magic and would not work. There must be at least the type, which would be SortField.FLOAT or something like that). If you keep that in mind, there is really no difference between:

{code}
new SortField("price", FieldCache.NUMERIC_FLOAT_PARSER)
{code}

and

{code}
new SortField("price", FieldCache.PLAIN_TEXT_FLOAT_PARSER)
{code}

equivalent to:

{code}
new SortField("price", SortField.FLOAT)
{code}

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721880#action_12721880 ] 

Yonik Seeley commented on LUCENE-1701:
--------------------------------------

Having the trie parsers public is good (or public factory method(s) to get the right parser given a set of trie params), but shouldn't they stay with the trie classes?  Or am I misunderstanding where you are proposing to move the parsers?

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722257#action_12722257 ] 

Yonik Seeley commented on LUCENE-1701:
--------------------------------------

I'll quote myself, and then attempt to not repeat myself further after this point (the back and forth is silly).
bq. The next step after adding NumericField seems to be "it's a bug if getDocument() doesn't return a NumericField, so we must encode it in the index". If that's the case, I'm -1 on adding NumericField in the first place.

Everyone thinks good APIs, good architecture, and good performance is important.  It imply otherwise is also silly.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722324#action_12722324 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq. In my opinion: Storing this info in the segments is not doable without pitfalls:

The proposal was not to store it into the segments file (which I agree it has serious problems, since it's global).  I had considered FieldInfos (which is "roughly" Lucene's "schema", per segment), but that too has clear problems.

My proposal was the flags per-field stored in the fdt file.  In that file, we are already writing one byte's worth of flags (only 3 of the bits are used now), for every stored field instance.  This is in FieldsWriter.java ~ line 181.  The flags now record whether each specific field instance was tokenized, compressed, binary.  FieldsReader then uses these flags to reconstruct the Field instances when building the document. This bits are never merged; they are copied (because they apply to that one field instance, in that one document).

My proposal was to add another flag bit (numeric) and make use of that to return a NumericField instance when you get your document back.  It would have no impact to the index size, since we still have 5 free bits to use.

But, it is technically a (one bit) change to the index format, which people seriously objected to.  So net/net I'm OK going forward without it.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721914#action_12721914 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

Yonik, I will explain my intention:
The real numeric parsing is always done in NumericUtils, as for conventional FieldCache the real parsing is done in Integer.parseInt() and so on.
Specific to the FieldCache is the Parser interface. This parser interface is a java interface specific to FieldCache. FieldCache currently has (private) static parser instances for Number.toString()-type fields. These parser are simple 5-liners and singletons. For trie fields, this is the same, the static parser instances (also singletons) should be moved also to FieldCache, after that both as public constants (like Mike said: PLAIN_TEXT_INT_PARSER and NUMERIC_INT_PARSER).
Currently for Trie fields there is a ugly hack in FieldCache, that stops parsing, when a term with lower precision is reached (as trie terms are ordered with highest precision first, the cache for a field is filled, when the first lower-precision term comes). Because of this, the trie parsers throw a unchecked StopFillCacheException to stop the iteration of TermEnum/TermDocs in the Uninverter. This is just a hack and because of package differences this FieldCache-internal exception is made public (see Javadocs). When moving the parser interfaces to FieldCace, this Exception can be hidden again and made private to the FieldCache implementation (until we have the better univerters some time in future, see LUCENE-831). NumericUtils then will only just export a method to get the shift value out of a encoded string (which I forgot to add in LUCENE-1673 when removing ShiftAttribute).
Trie field parsing does not depend on Trie-specific flags, precisionStep is not needed, so the parsers are real singletons.

SortFields can then simply created in the Following way: new SortField(field, FieldCache.PLAIN_TEXT_INT_PARSER) for a conventional int field (like with SortField.INT) or new SortField(field, FieldCache.NUMERIC_INT_PARSER). giving a NULL parser still does the same as before, it uses FieldCache.PLAIN_TEXT_INT_PARSER implicitely.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721993#action_12721993 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

{quote}

bq. new NumericSortField("price");

Magic.
How is this supposed to work?
How will the sort field know the exact encoding? How many bits are stored in a payload or in the position?
Could I make my own field type that had the same privileges?
{quote}

Woops, sorry, you're right: you'd need to specify the type, at least.

So.... how about for SortField we make the parser a required arg (as
Uwe suggested) for numeric types?  Allow null to mean "give me the
default parser" for all other types.  And we consolidate all parsers
in FieldCache.


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722218#action_12722218 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

I know you will kill me, Yonik, and Mike will love me :-) but there is a possibility to also support Trie fields with standard SortField.XXX constants using autodetection. Trie fields always start with a shift-prefix defining the type and so for sure contain non-digits. So FieldCache could simply test and catch NumberFormatException.

So maybe this would be an option, to make the default (parser== null in FieldCaches getInts(),...) detect this automatically.

Users then could use SortField/FieldCache as before, ignoring the real encoding. If I would implement this, I could remove the enforcing to parser==null in SortField again and make FieldCache do the detection in this case.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721830#action_12721830 ] 

Earwin Burrfoot edited comment on LUCENE-1701 at 6/19/09 8:50 AM:
------------------------------------------------------------------

Mike, I very much agree with everything you said, except "factory is less consumable than constructor" and "add stuff to index to handle NumericField".

Out of your three examples the second one is bad, no questions. But first and last are absolutely equal in terms of consumability.
Static factories are cool (they allow to switch implementations and instantiation logic without changing API) and are as easy to use (probably even easier with generics in Java5) as constructors.

If we add some generic storable flags for Lucene fields, this is cool (probably), NumericField can then capitalize on it, as well as users writing their own NNNFields.
Tying index format to some particular implementation of numerics is bad design. Why on earth can't my own split-field (vs single-field as in current Lucene) trie-encoded number enjoy the same benefits as NumericField from Lucene core?

bq. By this same logic, should we remove NumericRangeFilter/Query and use static factories instead?
I do use factory methods for all my queries and filters, and it makes me feel warm and fuzzy! :) Under the hood some of them consult FieldInfo to instantiate custom-tailored query variants, so I just use range(CREATION_TIME, from, to) and don't think if this field is trie-encoded or raw.

"Simple things should be simple", okay. Complex things should be simple too, argh! :)

      was (Author: earwin):
    Mike, I very much agree with everything you said, except "factory is less consumable than constructor" and "add stuff to index to handle NumericField".

Out of your three examples the second one is bad, no questions. But first and last are absolutely equal in terms of consumability.
Static factories are cool (they allow to switch implementations and instantiation logic without changing API) and are as easy to use (probably even easier with generics in Java5) as constructors.

If we add some generic storable flags for Lucene fields, this is cool (probably), NumericField can then capitalize on it, as well as users writing their own NNNFields.
Tying index format to some particular implementation of numerics is bad design. Why on earth can't my own split-field (vs single-field as in current Lucene) trie-encoded number enjoy the same benefits as NumericField from Lucene core?

bq. By this same logic, should we remove NumericRangeFilter/Query and use
static factories instead?
I do use factory methods for all my queries and filters, and it makes me feel warm and fuzzy! :) Under the hood some of them consult FieldInfo to instantiate custom-tailored query variants, so I just use range(CREATION_TIME, from, to) and don't think if this field is trie-encoded or raw.

"Simple things should be simple", okay. Complex things should be simple too, argh! :)
  
> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721989#action_12721989 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq. It's not sufficient for everyone, there will be (many) enhancements to trie, and there will be other numeric field types. Trie is not, and should not be the only numeric field, and should not be baked into the index format.

But trie is the best we have today?

And it's sooooo much better than we had before?  Prior to trie, with
Lucene directly if you did a RangeQuery on a float/double field, it
was a nightmare.  You had to get Solr's NumberUtils (or, roll your
own) to get something that even returned the correct results.  Then,
you'll discover that performance was basically unusable.

The addition of numeric indexing to Lucene is a *major* step forward.
Why not make it work well with Lucene, today, and as these future
improvements arrive, we take them as they come?  Design for today.

Anyway, if push comes to shove (which it *seems* to be doing!), I can
accept just returning a Field (not NumericField) from the document at
search time.  I think it's a worse user experience, and will be seen
as buggy, but it would in fact mean zero change to the index format.


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721920#action_12721920 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

When this comes (payloads, CSF,...) we will have the new LUCENE-831 field cache, where we will have a ValueSource-like thing. Until then, a static parser is enough, and the static parser is still in NumericUtils! I want to move it because of this hack with the unchecked exception. When the new FieldCache is alive this is all nonsense.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722341#action_12722341 ] 

Uwe Schindler edited comment on LUCENE-1701 at 6/21/09 3:45 AM:
----------------------------------------------------------------

Not sure! And it is a trap for the BYTE and SHORT caches, too!

But for sure, the old, never used auto-cache should be deprecated, too!

      was (Author: thetaphi):
    Not sure!

But for sure, the old, never used auto-cache should be deprecated, too!
  
> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: LUCENE-1701-test-tag-special.patch
                LUCENE-1701.patch

The last patch was still not 100% backwards compatible, now it is. The modified test-tag TestExtendedFieldCache shows it, it will be committed to backwards-compatibility branch

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721915#action_12721915 ] 

Yonik Seeley commented on LUCENE-1701:
--------------------------------------

Regardless of the fact that plain_int parser is on FieldCache, it still doesn't seem like we should add parsers to FieldCache for every field type.  It's also the case that a single static parser won't be able to handle all the cases... consider future functionality of using positions or payloads to fill out the full value.  A factory allows you to return the correct implementation given the parameters (number of bits to store as position, etc).

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Summary: Add NumericField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache  (was: Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache)

> Add NumericField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722017#action_12722017 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq. That is what I was talking about all the time!

Eh, yeah... somethings things just need a good hashing out!

2.9 is really shaping up to be an awesome release...

bq. But this is all not really the best solution. It is too bad, that LUCENE-831 (not in current form, it is just not discussed to the end) is not to be included into 2.9.

Progress not perfection!

bq. I know we will have to stick with parsers until end of days, because the new ValueSource staff and Uninverters was not introduces early enough to be removed with 3.0.

Well we are discussing relaxing the back-compat policy... so maybe
earlier than "end of days".

bq. And Trie fields with positions/payloads would never work with current FieldCache, so I need no factories.

We'll cross that bridge when we get there...

bq. So this would be postponed until this is done. Then we could have a ValueSource for trie fields, where you could add these magic stuff like payloads and so on. But until this comes to reality (including CSF), the static parsers is all we have until now and is best placed in FieldCache (because of the strong linkage with this ugly exception to be hidden to the outside).

I agree.

bq. Earwin, Yonik: I know TrieRange is only one implementation of the whole numeric problem, but none of you ever presented your implementation to the public. This is the best we have now, its included into core and everybody is happy. If you have a better private implementation, you can still use it!

We shouldn't weaken trie's integration to core just because others
have private implementations.  What's important is that we don't
weaken those private implementations with trie's addition, and I don't
think our approach here has done that.


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722558#action_12722558 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

bq. Could you add the "<b>NOTE:</b> This API is experimental and might change in incompatible ways in the next release." caveat to the javadocs?

For the whole TrieAPI or only NumericUtils? If the first, I would do this with an general JavaDoc commit after this issue. If only NumericField, I could do it now.

bq. Can you change this: 

Sure, we had this the last time, too (I like my variant more, so I always automatically write it in that way)

bq. I think we can't actually deprecate NumberTools until we can call FieldCache.getShorts/getBytes on a NumericField? Ie, people relying on short/byte (to consume much less memory in FieldCache) cannot switch to numeric, and so must continue to zero-pad if they need to use RangeQuery/Filter?

I will open an issue because of this byte/short trie fields.

NumberTools does not handle zero-padding, so it could stay deprecated. Numbers encoded with NumberTools cannot be natively sorted on at all (only using StringIndex) and can only handle longs.

You may mean NumberUtils from Solr in contrib/spatial, but this class is not yet released and is only used for spatial.

bq. You need a CHANGES entry. 

Will come.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: LUCENE-1701.patch

Attached is a patch wil the latest changes:
- Experimental-warning to all Numeric* classes
- Formatting fixes
- CHANGES.txt

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722579#action_12722579 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

{quote}
bq. Sure, we had this the last time, too (I like my variant more, so I always automatically write it in that way)

Which "last time"?  Is there somewhere in the code base now where
we do this?

We generally try (though, don't always succeed) to follow Sun's coding
guidelines (http://java.sun.com/docs/codeconv/) except 2-space indent
not 4.
{quote}

This was not against the change. With "last time" I meant that some time ago you mentioned the same in a different patch from me. I will change it.

My note was only, that I "automatically" create such code, because I for myself find its better readable. That was all :-)

{quote}
bq. NumberTools does not handle zero-padding, so it could stay deprecated. Numbers encoded with NumberTools cannot be natively sorted on at all (only using StringIndex) and can only handle longs.

Actually, I believe it does do 0 padding and handles negative numbers
correctly (NumberTools.longToString)?

Ie, I can take a short now, call longToString, index with that, do
[possibly inefficient] RangeQuery against it, and sort against it
using only 2 bytes per doc, today?
{quote}

You cannot do this with NumberTools. NumberTools uses an special radix 36 encoding (and not radix 10 like normal numbers). The encoding is just like NumericUtils not human-readable and so cannot be parsed with Number.toString(). To convert back, you need the method from the same class.

Because of this you have two possilities: Write your own parser and pass it to SortField/FieldCache or sort using StringIndex (because it is sortable according to String.compareTo).

So it can be deprecated.

If sombody want to do encoding and parsing with FieldCache.getShorts() there is no way around a DecimalFormat with zero-padding and the problem with negative numbers.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: NumericField.java

Here is a first draft of NumericField with the same handling as NumericTokenStream. It is for indexing only, on retrieving stored fields, one would get the numeric field value as a string (according to Number.toString()). Because of this, this class returns in stringValue() the string representation of the numeric value and with tokenStreamValue the NumericTokenStream is returned.

For SortField I still have the very strong opinion, that here a extra class is not needed. A factory is enough (and even too much, supplying the Parser to SortField would be enough).

I will later post a patch with this file and the moved/made public Parsers.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721972#action_12721972 ] 

Yonik Seeley commented on LUCENE-1701:
--------------------------------------

bq. Because.... we've decided that this is our core approach to numerics?

We decided to move trie from contrib to core because it was the most stable and usable numeric implementation.  We did not decide to rewrite it, or make it "special".  It's not sufficient for everyone, there will be (many) enhancements to trie, and there will be other numeric field types.  Trie is not, and should not be the only numeric field, and should not be baked into the index format.

The next step after adding NumericField seems to be "it's a bug if getDocument() doesn't return a NumericField, so we must encode it in the index".  If that's the case, I'm -1 on adding NumericField in the first place.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722305#action_12722305 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

 Yes, it would work this way. This only would violate Yoniks complaints about not miximg Trie too much into the other code, but this is already done because of this StopFillCacheException usage. When we do LUCENE-831, this should be thought about, too.

I would make the core parsers public to enable users to have full control (on the other hand I could now hide also the trie parsers). But this is a bad approach, wherever automatisms are envolved, oneshould always have the possibility to fix to one parser. And why do we have the SortField/FieldCache accessors with parser parameter, when you cannot even use the default ones?

P.S.: About payloads & positions and the need for extra parameters to the parser: After adding support for positions or payloads to encode the highest precision, there is still no need for an extra SortField/Parser class or factory. The future "ValueSource" starts to decode the values until a change in shift occurs. This first shift is for sure the highest precision (because of term ordering), if it is 0, its like now (no payloads/prositions); if the first visible shift>0, payloads/positions were used and the numbero of bits there is also known.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722335#action_12722335 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

But, with the new public exposure of the field cache parsers, this is a newly added trap?  You could accidentally consume 2X the RAM.  Since auto-detection of the parser simply means a specific parser was chosen, why can't we then cache using that chosen parser?  Then you would not risk 2X memory usage.  Or, maybe we should leave the parsers private to not have this risk.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722577#action_12722577 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

{quote}
bq. Could you add the "<b>NOTE:</b> This API is experimental and might change in incompatible ways in the next release." caveat to the javadocs?

For the whole TrieAPI or only NumericUtils? If the first, I would do this with an general JavaDoc commit after this issue. If only NumericField, I could do it now.
{quote}

For the whole thing; I think an added NOTE at each class level
javadoc is what we need.  A separate javadoc issue is good, but
it needs to be done for 2.9.

{quote}
bq. Can you change this:

Sure, we had this the last time, too (I like my variant more, so I always automatically write it in that way)
{quote}

Which "last time"?  Is there somewhere in the code base now where
we do this?

We generally try (though, don't always succeed) to follow Sun's coding
guidelines (http://java.sun.com/docs/codeconv/) except 2-space indent
not 4.

{quote}
bq. I think we can't actually deprecate NumberTools until we can call FieldCache.getShorts/getBytes on a NumericField? Ie, people relying on short/byte (to consume much less memory in FieldCache) cannot switch to numeric, and so must continue to zero-pad if they need to use RangeQuery/Filter?

NumberTools does not handle zero-padding, so it could stay deprecated. Numbers encoded with NumberTools cannot be natively sorted on at all (only using StringIndex) and can only handle longs.
{quote}

Actually, I believe it does do 0 padding and handles negative numbers
correctly (NumberTools.longToString)?

Ie, I can take a short now, call longToString, index with that, do
[possibly inefficient] RangeQuery against it, and sort against it
using only 2 bytes per doc, today?

bq. I will open an issue because of this byte/short trie fields (LUCENE-1710)

OK but since we've marked it 3.1 (which I think is OK; though in
CHANGES lets document the limitation?), we should un-deprecate
NumberTools, now, and deprecate it again along with LUCENE-1710?


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722774#action_12722774 ] 

Uwe Schindler edited comment on LUCENE-1701 at 6/22/09 12:38 PM:
-----------------------------------------------------------------

bq. Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.

I think 4 for ints is a good start, better as 4 for longs (which produces 16 different precision terms and upto 31 term enums [= precision changes] per range). 6 is a good idea, it brings a little bit more than 8 but does not produce too much precision changes. I tested that also with my 2 M numeric-only index here.

Mike: As you see, the precision step is a good config approach, so an default is should be choosen carefully.
It may even be different for the same data type, when e.g. you have longs, but all longs in your index are only in a very limited range -- ok. You could use an int, too. But e.g. if you index dates as long and your dates are only between two years or something like that, 4 may still good. This is because on a smaller range, the algorith does not need to to up to the lowest precision.

bq. Though, if you want really fast dates, chosing hour/day/month/year as precision steps is vastly superior, plus it also clicks well with user-selected ranges. Still, I dumped this approach for uniformity and clarity.

That is clear. Because these precisions are fitting exact to users queries in case of dates (often users take full days when selecting the range).

Nice to hear, that you use TrieRange? What is your index spec and measured query speeds (if it does not go too far into company internals)?

      was (Author: thetaphi):
    bq. Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.

I think 4 for ints is a good start, better as 4 for longs (which produces 16 different precision terms and upto 31 term enums [= precision changes] per range). 6 is a good idea, it brings a little bit more than 8 but does not produce too much precision changes. I tested that also with my 2 M numeric-only index here.

Mike: As you see, the precision step is a good config approach, so an default is shpould be choosen carefully.
It may even be different for different data types, when e.g. you have longs, but all longs in your index are only in a very limited range -- ok. You could use an int, too. But e.g. if you index dates as long and your dates are only between two years or something like that, 4 may still good. This is because on a smaller range, the algorith does not need to to up to the lowest precision.

bq. Though, if you want really fast dates, chosing hour/day/month/year as precision steps is vastly superior, plus it also clicks well with user-selected ranges. Still, I dumped this approach for uniformity and clarity.

That is clear. Because these precisions are fitting exact to users queries in case of dates (often users take full days when selecting the range).

Nice to hear, that you use TrieRange? What is your index spec and measured query speeds (if it does not go too far into company internals)?
  
> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1701:
----------------------------------

    Attachment: LUCENE-1701.patch

Patch with all changes, including LUCENE-1687 (it is easier to do this together):
- Add NumericField, change JavaDocs to prefer this where possible.
- Change range tests to use NumericField during build of test index, this also tests stored fields with NumericField (much cleaner now)
- Remove SortField factory from NumericUtils, this class is now almost only for expert users; creating a SortField is possible with SortField ctor using parser instance.
- Make all parsers in FieldCache public (DEFAULT_XXX_PARSER)
- Add trie parsers to FieldCache, too (NUMERIC_UTILS_XXX_PARSER)
- Hide StopFillCacheException-hack
- Change SortField to automatically initialize the correct parser according to the type (defaults to text-only parsers) -- there is still some good javadocs missing to tell the user, that it is better to use SortField(String, Parser) instead of SortField(String, type-int) for numeric values, especially when indexed using NumericField.
- Because SortField is serializable, all parsers were made singletons and serializable, too (superinterface Parser extends Serializable, default parsers define readResolve() to enforce singletons, which are important for FieldCache to work correctly)
- Remove now unneeded (parser==null) checks in sorting code, as SortField enforces a non-null parser now.
- Remove all code from ExtendedFieldCache and move to FieldCache (see LUCENE-1687), keep a stub for binary backwards-compatibility. The only implementation is now FieldCacheImpl referred to by DEFAULT (and EXT_DEFAULT for bw).

A short note: SortField is only serializable, if all custom comparators used are also serializable, maybe we should also note this in the docs. Parsers are automatically serializable (because superinterface), but not automatically real singletons (but this is not Lucenes problem).

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722765#action_12722765 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

bq. Uwe, can we give a default (4?) for precisionStep, when creating a NumericField, NumericRangeFilter/Query? If we find a good default, it can simplier applied in one issue (just some additional ctors and factories).

Can you open an issue? There are some problems with defining a good default. In my environment, 8 makes the best results, 4 is only little faster.
Problems are described in JavaDocs: smaller precisionStep -> more different precisions:
- more seek operations and new TermEnums
- but less terms
For my index with 8 is in good relation to each other, with 4 seeking gets costly and with 2, I see no difference, only a much larger index (600,000 docs on PANGAEA, 4 Mio doc index locally on Laptop with only trie numbers).

So I do not want to set a default with enough tests from different people/scenarios, and this comes when 2.9 is out and everybody tries out :(

I will commit this patch in a day or two after applying LUCENE-1701-test-tag-special.patch to 2.4-test-branch.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722341#action_12722341 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

Not sure!

But for sure, the old, never used auto-cache should be deprecated, too!

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721810#action_12721810 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

Uwe can you also open an issue for handling byte/short/Date with
Numeric*?

bq. I vote for factories - escaping back-compat woes by exposing minimum interface.

By this same logic, should we remove NumericRangeFilter/Query and use
static factories instead?

We can't let fear of our back-compat policies prevent progress.

I seem to be the only one [who's speaking up, at least] who feels
consumability of Lucene's APIs is important...

Here's my reasoning: numeric fields are common; many apps need them.
But, it's painful to use them today; it's a trap for users because
Lucene acts like it can handle them (eg SortField.INT exists) but then
RangeQuery is buggy unless you encode the numbers (zero pad ints, use
Solr's or your own NumberUtils for floats/doubles).  And once you
figured out the encoding, you discovered RangeQuery can have horrific
performance.

For the longest time Lucene could not provide good ootb handling of
numerics, but now finally an awesome step forward (thank you Uwe!)
comes along... and Lucene can provide correct & performant handling of
numerics.

Such an important & useful functionality deserves a consumable API.
It should be obvious to people playing with Lucene how to use numeric
fields.  I should be able to do this:

{code}
Document doc = new Document();
doc.add(new NumericField("price", 15.50f));
{code}

not this:

{code}
Document doc = new Document();
Field f = new Field("price", new NumericTokenStream(4).setFloatValue(15.50f));
f.setOmitNorms(true);
f.setOmitTermFreqAndPositions(true);
doc.add(field);
{code}

nor, this:

{code}
Document doc = new Document();
doc.add(NumericUtils.createFloatField("price", 15.50f));
{code}

When I want to reuse, I should be able to call
{{NumericField.setFloatValue()}}, not ask the TokenStream to set the
value.

In fact, as a user of this API, I shouldn't even have to know that a
powerful TokenStream was created to index my NumericField.  I
shouldn't have to know to set those advanced flags on Field.  These
are implementation details.  In fact with time we may make
improvements to these "implemenation details", so we don't want such
implementation details out in the user's code.

NumericUtils should be utility methods used only by the current
implemention.  Ideally it would not even be public, but Java doesn't
give us the ability to be package private to "org.apache.lucene.*".

Here's what I propose:

  * Add NumericField and NumericSortField, and 
    rename RangeQuery -> TermRangeQuery  (TextRangeQuery?).

  * Move the Numeric FieldCache parsers into FieldCache,
    and make them (PLAIN_TEXT_INT_PARSER vs NUMERIC_INT_PARSER) public.

  * I would also really like to have NumericField come back when you
    retrieve the doc; this only requires 1 bit added to the flags
    stored in each doc's entry in the .fdt file.

Why should we make such an excellent addition to Lucene, only to make
it hard to use?


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722002#action_12722002 ] 

Uwe Schindler commented on LUCENE-1701:
---------------------------------------

That is what I was talking about all the time!

But this is all not really the best solution. It is too bad, that LUCENE-831 (not in current form, it is just not discussed to the end) is not to be included into 2.9. I know we will have to stick with parsers until end of days, because the new ValueSource staff and Uninverters was not introduces early enough to be removed with 3.0. And Trie fields with positions/payloads would never work with current FieldCache, so I need no factories.

So this would be postponed until this is done. Then we could have a ValueSource for trie fields, where you could add these magic stuff like payloads and so on. But until this comes to reality (including CSF), the static parsers is all we have until now and is best placed in FieldCache (because of the strong linkage with this ugly exception to be hidden to the outside).

Earwin, Yonik: I know TrieRange is only *one* implementation of the whole numeric problem, but none of you ever presented your implementation to the public. This is the best we have now, its included into core and everybody is happy. If you have a better private implementation, you can still use it!

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722730#action_12722730 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

Uwe, can we give a default (4?) for precisionStep, when creating a NumericField, NumericRangeFilter/Query?

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721961#action_12721961 ] 

Michael McCandless commented on LUCENE-1701:
--------------------------------------------

bq. Static factories are cool (they allow to switch implementations and instantiation logic without changing API) and are as easy to use (probably even easier with generics in Java5) as constructors.

Coolness is in the eye of the beholder?

Yes, they are cool in that they give the *developer* (us) future
freedom (to change the actual class returned, re-use instances, use
singletons, etc.), but not cool (in my eyes) for consumability.

Static factory classes are a good fit when the impls really should
remain anonymous because there are trivial differences.  EG the 12
different impls that can be returned by TopFieldCollector.create are a
good example.

But NumericField vs Field, and SortField vs NumericSortField, are
different and should be seen as different to consumer of Lucene's API.

bq. If we add some generic storable flags for Lucene fields, this is cool (probably), NumericField can then capitalize on it, as well as users writing their own NNNFields.

+1  Wanna make a patch?

Then NumericField would just tap in to this extensibility... and,
somehow, in our future improved search time document() API, have the
ability to make a NumericField.

bq. Why on earth can't my own split-field (vs single-field as in current Lucene) trie-encoded number enjoy the same benefits as NumericField from Lucene core?

Because.... we've decided that this is our core approach to numerics?

Seriously, I don't see that as unfair.  Trie works well.  We have
chosen it as our way (for now, until something better comes along) of
handling numerics.  Just like we've picked a certain format for the
terms dict and prx file.

Sure, we should make it easy (add extensibility) so external fields
could store stuff in the index, but that doesn't mean we should hold
back on Numeric* consumability until we get that extensibility.

bq. I do use factory methods for all my queries and filters, and it makes me feel warm and fuzzy!  Under the hood some of them consult FieldInfo to instantiate custom-tailored query variants, so I just use range(CREATION_TIME, from, to) and don't think if this field is trie-encoded or raw.

Someday maybe I'll convince you to donate this "schema" layer on top
of Lucene ;) But I hope there are SOME named classes in there and not
all static factory methods returning anonymous untyped impls.

bq. "Simple things should be simple", okay. Complex things should be simple too, argh! 

Whoa, this is all simple stuff?  What should be complex about using
numeric fields in Lucene?  This whole issue is "simple things should
be simple".


> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722558#action_12722558 ] 

Uwe Schindler edited comment on LUCENE-1701 at 6/22/09 4:01 AM:
----------------------------------------------------------------

bq. Could you add the "<b>NOTE:</b> This API is experimental and might change in incompatible ways in the next release." caveat to the javadocs?

For the whole TrieAPI or only NumericUtils? If the first, I would do this with an general JavaDoc commit after this issue. If only NumericField, I could do it now.

bq. Can you change this: 

Sure, we had this the last time, too (I like my variant more, so I always automatically write it in that way)

bq. I think we can't actually deprecate NumberTools until we can call FieldCache.getShorts/getBytes on a NumericField? Ie, people relying on short/byte (to consume much less memory in FieldCache) cannot switch to numeric, and so must continue to zero-pad if they need to use RangeQuery/Filter?

I will open an issue because of this byte/short trie fields (LUCENE-1710)

NumberTools does not handle zero-padding, so it could stay deprecated. Numbers encoded with NumberTools cannot be natively sorted on at all (only using StringIndex) and can only handle longs.

You may mean NumberUtils from Solr in contrib/spatial, but this class is not yet released and is only used for spatial.

bq. You need a CHANGES entry. 

Will come.

      was (Author: thetaphi):
    bq. Could you add the "<b>NOTE:</b> This API is experimental and might change in incompatible ways in the next release." caveat to the javadocs?

For the whole TrieAPI or only NumericUtils? If the first, I would do this with an general JavaDoc commit after this issue. If only NumericField, I could do it now.

bq. Can you change this: 

Sure, we had this the last time, too (I like my variant more, so I always automatically write it in that way)

bq. I think we can't actually deprecate NumberTools until we can call FieldCache.getShorts/getBytes on a NumericField? Ie, people relying on short/byte (to consume much less memory in FieldCache) cannot switch to numeric, and so must continue to zero-pad if they need to use RangeQuery/Filter?

I will open an issue because of this byte/short trie fields.

NumberTools does not handle zero-padding, so it could stay deprecated. Numbers encoded with NumberTools cannot be natively sorted on at all (only using StringIndex) and can only handle longs.

You may mean NumberUtils from Solr in contrib/spatial, but this class is not yet released and is only used for spatial.

bq. You need a CHANGES entry. 

Will come.
  
> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722060#action_12722060 ] 

Earwin Burrfoot commented on LUCENE-1701:
-----------------------------------------

bq. Someday maybe I'll convince you to donate this "schema" layer on top of Lucene
It's not generic enough to be of use for every user of Lucene, and it doesn't aim to be such. It also evolves, and donating something to Lucene means casting it in concrete.
So that's not me being greedy or lazy (okay, maybe a little bit of the latter), it's simply not public-quality (as I understand it) code.
I can share the design if anybody's interested, but everyone's coping with it themselves it seems.

Solr has its own schema approach, and it has its merits and downfalls compared to mine. That's what is nice, we're able to use the same library in differing ways, and it doesn't force its sense of 'best practices' on us. 

bq. But I hope there are SOME named classes in there and not all static factory methods returning anonymous untyped impls.
SOME of them aren't static :-D

bq. We shouldn't weaken trie's integration to core just because others have private implementations.
You shouldn't integrate into core something that is not core functionality. Think microkernels.
It's strange seeing you drive CSFs, custom indexing chains, pluggability everywhere on one side, and trying to add some weird custom properties into index that are tightly interwoven with only one of possible numeric implementations on the other side.

bq. Design for today.
And spend two years deprecating and supporting today's designs after you get a better thing tomorrow. Back-compat Lucene-style and agile design aren't something that marries well.

bq. What's important is that we don't weaken those private implementations with trie's addition, and I don't think our approach here has done that.
You're weakening Lucene itself by introducing too much coupling between its components.

IndexReader/Writer pair is a good example of what I'm arguing against. A dusty closet of microfeatures that are tightly interwoven into a complex hard-to-maintain mess with zillions of (possibly broken) control paths - remember mutable deletes/norms+clone/reopen permutations? It could be avoided if IR/W were kept to the bare minimum (which most people are going to use), and more advanced features were built on top of it, not in the same place.

NRT seems to tread the same path, and I'm not sure it's going to win that much turnaround time after newly-introduced per-segment collection. Some time ago I finished a first version of IR plugins, and enjoy pretty low reopen times (field/facet/filter cache warmups included). (Yes, I'm going to open an issue for plugins once they stabilize enough)

{quote}
> If we add some generic storable flags for Lucene fields, this is cool (probably), NumericField can then capitalize on it, as well as users writing their own NNNFields.
+1 Wanna make a patch?
{quote}

No, I'd like to continue IR cleanup and play with positionIncrement companion value that could enable true multiword synonyms. 
I know, I know, it's do-a-cracy. But it's not an excuse for hacks.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721929#action_12721929 ] 

Yonik Seeley commented on LUCENE-1701:
--------------------------------------

The exception certainly is a hack - but any new field cache API should be powerful enough to handle trie through the normal APIs that anyone else would have to go through when implementing their own field type.  If Trie *needs* to be part of the new field cache implementation, that would be a big red flag.

> Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField to o.a.l.document specific for easy indexing. An alternative would be to add a NumericUtils.newXxxField() factory, that creates a preconfigured Field instance with norms and tf off, optionally a stored text (LUCENE-1699) and the TokenStream already initialized. On the other hand NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. As the new SortField API (LUCENE-1478) makes it possible to support a parser in SortField instantiation, it would be good to have the static parsers in FieldCache public available. SortField would init its member variable to them (instead of NULL), so making code a lot easier (FieldComparator has this ugly null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make the code cleaner and we would be able to hide the "hack" StopFillCacheException by making it private to FieldCache (currently its public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org