You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/06/14 14:17:10 UTC

New Token API was Re: Payloads and TrieRangeQuery

Agreed.  I've been bringing it up for a while now and made the same  
comments when it was first introduced, but felt like the lone voice in  
the wilderness on it and gave way [1], [2], [3].  Now that others are  
writing/converting, I think it is worth revisiting.

That being said, I did just write my first TokenFilter with it, and  
didn't think it was that hard.  There are some gains in it and the API  
can be simpler if you just need one or two attributes (see  
DelimitedPayloadTokenFilter), although, just like the move to using  
char [] in Token, as soon as you do something like store a Token, you  
lose most of the benefit, I think (for the char [] case, as soon as  
you need a String in one of your filters, you lose the perf. gain).   
The annoying parts are that you still have to implement the deprecated  
next() part, otherwise chances are the thing is unusable by everyone  
at this point anyway.

Add on top of it, that the whole point of customizing the chain is to  
use it in search and, frankly speaking, somehow I think that part of  
the patch was held back.

I personally would vote for reverting until a complete patch that  
addresses both sides of the problem is submitted and a better solution  
to cloning is put forth.

-Grant

[1] http://issues.apache.org/jira/browse/LUCENE-1422,
[2] http://www.lucidimagination.com/search/document/5daf6d7b8027b4d3/tokenstream_and_token_apis#9e2d0d2b5dc118d4 
, and the rest of the discussion on that thread.
[3] http://www.lucidimagination.com/search/document/4274335abcf31926/new_tokenstream_api_usage

On Jun 13, 2009, at 10:32 PM, Mark Miller wrote:

> Yonik Seeley wrote:
>> Even non-API changes have tradeoffs... the indexing improvements  
>> (err,
>> total rewrite) made that code *much* harder to understand and debug.
>> It's a net win since the indexing performance improvements were so
>> fantastic.
>>
> I agree - very hard to follow, worth the improvements.
>
> Just to throw something out, the new Token API is not very  
> consumable in my experience. The old one was very intuitive and very  
> easy to follow the code.
>
> I've had to refigure out what the heck was going on with the new one  
> more than once now. Writing some example code with it is hard to  
> follow or justify to a new user.
>
> What was the big improvement with it again? Advanced, expert custom  
> indexing chains require less casting or something right?
>
> I dunno - anyone else have any thoughts now that the new API has  
> been in circulation for some time?
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Some great points - especially the decision between a deprecated API, 
and a new experimental one subject to change. Bit of a rock and a hard 
place for a new user.

Perhaps we should add a little note with some guidance.


- Mark

Robert Muir wrote:
> let me try some slightly more constructive feedback:
>
> new user looks at TokenStream javadocs:
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
> immediately they see deprecated, text in red with the words
> "experimental", warnings in bold, the whole thing is scary!
> due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
> in a bad way, and its probably the most important method to a new
> user!
> there's also a stray bold tag gone haywire somewhere, possibly .incrementToken()
>
> from a technical perspective, the documentation is excellent! but for
> a new user unfamiliar with lucene, its unclear exactly what steps to
> take: use the scary red experimental api or the old deprecated one?
>
> theres also some fairly advanced stuff such as .captureState and
> .restoreState that might be better in a different place.
>
> finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
> [one is static, one is not], especially because it states all streams
> and filters in one chain must use the same API, is there a way to
> simplify this?
>
> i'm really terrible with javadocs myself, but perhaps we can come up
> with a way to improve the presentation... maybe that will make the
> difference.
>
> On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com> wrote:
>   
>> Mark, I'll see if I can get tests produced for some of those analyzers.
>>
>> as a new user of the new api myself, I think I can safely say the most
>> confusing thing about it is having the old deprecated API mixed in the
>> javadocs with it :)
>>
>> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com> wrote:
>>     
>>> Robert Muir wrote:
>>>       
>>>> Mark, I created an issue for this.
>>>>
>>>>         
>>> Thanks Robert, great idea.
>>>       
>>>> I just think you know, converting an analyzer to the new api is really
>>>> not that bad.
>>>>
>>>>         
>>> I don't either. I'm really just complaining about the initial readability.
>>> Once you know whats up, its not too much different. I just have found myself
>>> having to refigure out whats up (a short task to be sure) over again after I
>>> leave it for a while. With the old one, everything was just kind of
>>> immediately self evident.
>>>
>>> That makes me think new users might be a little more confused when they
>>> first meet again. I'm not a new user though, so its only a guess really.
>>>       
>>>> reverse engineering what one of them does is not necessarily obvious,
>>>> and is completely unrelated but necessary if they are to be migrated.
>>>>
>>>> I'd be willing to assist with some of this but I don't want to really
>>>> work the issue if its gonna be a waste of time at the end of the
>>>> day...
>>>>
>>>>         
>>> The chances of this issue being fully reverted are so remote that I really
>>> wouldnt let that stop you ...
>>>       
>>>> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
>>>>
>>>>         
>>>>> Robert Muir wrote:
>>>>>
>>>>>           
>>>>>>> As Lucene's contrib hasn't been fully converted either (and its been
>>>>>>> quite
>>>>>>> some time now), someone has probably heard that groan before.
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> hope this doesn't sound like a complaint,
>>>>>>
>>>>>>             
>>>>> Complaints are fine in any case. Every now and then, it might cause a
>>>>> little
>>>>> rant from me or something, but please don't let that dissuade you :)
>>>>> Who doesnt like to rant and rave now and then. As long as thoughts and
>>>>> opinions are coming out in a non negative way (which certainly includes
>>>>> complaints),
>>>>> I think its all good.
>>>>>
>>>>>           
>>>>>>  but in my opinion this is
>>>>>> because many do not have any tests.
>>>>>> I converted a few of these and its just grunt work but if there are no
>>>>>> tests, its impossible to verify the conversion is correct.
>>>>>>
>>>>>>
>>>>>>             
>>>>> Thanks for pointing that out. We probably get lazy with tests, especially
>>>>> in
>>>>> contrib, and this brings up a good point - we should probably push
>>>>> for tests or write them before committing more often. Sometimes I'm sure
>>>>> it
>>>>> just comes downto a tradeoff though - no resources at the time,
>>>>> the class looked clear cut, and it was just contrib anyway. But then here
>>>>> we
>>>>> are ... a healthy dose of grunt work is bad enough when you have tests to
>>>>> check it.
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>
>>>>
>>>>         
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>       
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>>     
>
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Some SVN cleanup, was: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

Done, tests pass. 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, June 15, 2009 10:40 PM
> To: java-dev@lucene.apache.org
> Subject: Re: New Token API was Re: Payloads and TrieRangeQuery
> 
> On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> 
> > And, in tests: test/o/a/l/index/store is somehow wrong placed. The class
> > inside should be in test/o/a/l/store. Should I move?
> 
> Please do!
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler<uw...@thetaphi.de> wrote:

> And, in tests: test/o/a/l/index/store is somehow wrong placed. The class
> inside should be in test/o/a/l/store. Should I move?

Please do!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

By the way, there is an empty "de" subdir in SVN inside analysis. Can this
be removed?

And, in tests: test/o/a/l/index/store is somehow wrong placed. The class
inside should be in test/o/a/l/store. Should I move?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Monday, June 15, 2009 10:18 PM
> To: java-dev@lucene.apache.org
> Subject: RE: New Token API was Re: Payloads and TrieRangeQuery
> 
> > there's also a stray bold tag gone haywire somewhere, possibly
> > .incrementToken()
> 
> I fixed this. This was going me on my nerves the whole day when I wrote
> javadocs for NumericTokenStream...
> 
> Uwe
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

> there's also a stray bold tag gone haywire somewhere, possibly
> .incrementToken()

I fixed this. This was going me on my nerves the whole day when I wrote
javadocs for NumericTokenStream...

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael Busch <bu...@gmail.com>.

Sounds promising, but I have to think about if there are not 
side-effects of this change other than a slowdown for people who create 
multiple tokens (which would be acceptable as you said, because it's not 
recommended anyway and should be rare).

On 6/15/09 1:46 PM, Uwe Schindler wrote:
>
> Maybe change the deprecation wrapper around next() and next(Token) 
> [the default impl of incrementToken()] to check, if the retrieved 
> token is not identical to the attribute and then just copy the 
> contents to the instance-Token? This would be a slowdown, but only be 
> the case for very rare TokenStreams that did not reuse token before 
> (and were slow before, too).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> ------------------------------------------------------------------------
>
> *From:* Michael Busch [mailto:buschmic@gmail.com]
> *Sent:* Monday, June 15, 2009 10:39 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: New Token API was Re: Payloads and TrieRangeQuery
>
> I have implemented most of that actually (the interface part and Token 
> implementing all of them).
>
> The problem is a paradigm change with the new API: the assumption is 
> that there is always only one single instance of an Attribute. With 
> the old API, it is recommended to reuse the passed-in token, but you 
> don't have to, you can also return a new one with every call of next().
>
> Now with this change the indexer classes should only know about the 
> interfaces, if shouldn't know Token anymore, which seems fine when 
> Token implements all those interfaces. BUT, since there can be more 
> than once instance of Token, the indexer would have to call 
> getAttribute() for all Attributes it needs after each call of next(). 
> I haven't measured how expensive that is, but it seems like a severe 
> performance hit.
>
> That's basically the main reason why the backwards compatibility is 
> ensured in such a goofy way right now.
>
>  Michael
>
> On 6/15/09 1:28 PM, Uwe Schindler wrote:
>
>> And I don't like the *useNewAPI*() methods either. I spent a lot of time
>> thinking about backwards compatibility for this API. It's tricky to do
>> without sacrificing performance. In API patches I find myself spending
>> more time for backwards-compatibility than for the actual new feature! :(
>>   
>> I'll try to think about how to simplify this confusing old/new API mix.
>>      
>   
> One solution to fix this useNewAPI problem would be to change the
> AttributeSource in a way, to return classes that implement interfaces (as
> you proposed some weeks ago). The good old Token class then do not need to
> be deprecated, it could simply implement all 4 interfaces. AttributeSource
> then must implement a registry, which classes implement which interfaces. So
> if somebody wants a TermAttribute, he always gets the Token. In all other
> cases, the default could be a *Impl default class.
>   
> In this case, next() could simply return this Token (which is the all 4
> attributes). The reuseableToken is simply thrown away in the deprecated API,
> the reuseable Token comes from the AttributeSource and is per-instance.
>   
> Is this an idea?
>   
> Uwe
>   
>   
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:java-dev-unsubscribe@lucene.apache.org  <ma...@lucene.apache.org>
> For additional commands, e-mail:java-dev-help@lucene.apache.org  <ma...@lucene.apache.org>
>   
>   
>    
>

RE: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

Maybe change the deprecation wrapper around next() and next(Token) [the
default impl of incrementToken()] to check, if the retrieved token is not
identical to the attribute and then just copy the contents to the
instance-Token? This would be a slowdown, but only be the case for very rare
TokenStreams that did not reuse token before (and were slow before, too).

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Michael Busch [mailto:buschmic@gmail.com] 
Sent: Monday, June 15, 2009 10:39 PM
To: java-dev@lucene.apache.org
Subject: Re: New Token API was Re: Payloads and TrieRangeQuery

 

I have implemented most of that actually (the interface part and Token
implementing all of them).

The problem is a paradigm change with the new API: the assumption is that
there is always only one single instance of an Attribute. With the old API,
it is recommended to reuse the passed-in token, but you don't have to, you
can also return a new one with every call of next().

Now with this change the indexer classes should only know about the
interfaces, if shouldn't know Token anymore, which seems fine when Token
implements all those interfaces. BUT, since there can be more than once
instance of Token, the indexer would have to call getAttribute() for all
Attributes it needs after each call of next(). I haven't measured how
expensive that is, but it seems like a severe performance hit.

That's basically the main reason why the backwards compatibility is ensured
in such a goofy way right now.

 Michael

On 6/15/09 1:28 PM, Uwe Schindler wrote: 

And I don't like the *useNewAPI*() methods either. I spent a lot of time 
thinking about backwards compatibility for this API. It's tricky to do 
without sacrificing performance. In API patches I find myself spending 
more time for backwards-compatibility than for the actual new feature! :(
 
I'll try to think about how to simplify this confusing old/new API mix.
    

 
One solution to fix this useNewAPI problem would be to change the
AttributeSource in a way, to return classes that implement interfaces (as
you proposed some weeks ago). The good old Token class then do not need to
be deprecated, it could simply implement all 4 interfaces. AttributeSource
then must implement a registry, which classes implement which interfaces. So
if somebody wants a TermAttribute, he always gets the Token. In all other
cases, the default could be a *Impl default class.
 
In this case, next() could simply return this Token (which is the all 4
attributes). The reuseableToken is simply thrown away in the deprecated API,
the reuseable Token comes from the AttributeSource and is per-instance.
 
Is this an idea?
 
Uwe
 
 
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael Busch <bu...@gmail.com>.

I have implemented most of that actually (the interface part and Token 
implementing all of them).

The problem is a paradigm change with the new API: the assumption is 
that there is always only one single instance of an Attribute. With the 
old API, it is recommended to reuse the passed-in token, but you don't 
have to, you can also return a new one with every call of next().

Now with this change the indexer classes should only know about the 
interfaces, if shouldn't know Token anymore, which seems fine when Token 
implements all those interfaces. BUT, since there can be more than once 
instance of Token, the indexer would have to call getAttribute() for all 
Attributes it needs after each call of next(). I haven't measured how 
expensive that is, but it seems like a severe performance hit.

That's basically the main reason why the backwards compatibility is 
ensured in such a goofy way right now.

  Michael

On 6/15/09 1:28 PM, Uwe Schindler wrote:
>> And I don't like the *useNewAPI*() methods either. I spent a lot of time
>> thinking about backwards compatibility for this API. It's tricky to do
>> without sacrificing performance. In API patches I find myself spending
>> more time for backwards-compatibility than for the actual new feature! :(
>>
>> I'll try to think about how to simplify this confusing old/new API mix.
>>      
>
> One solution to fix this useNewAPI problem would be to change the
> AttributeSource in a way, to return classes that implement interfaces (as
> you proposed some weeks ago). The good old Token class then do not need to
> be deprecated, it could simply implement all 4 interfaces. AttributeSource
> then must implement a registry, which classes implement which interfaces. So
> if somebody wants a TermAttribute, he always gets the Token. In all other
> cases, the default could be a *Impl default class.
>
> In this case, next() could simply return this Token (which is the all 4
> attributes). The reuseableToken is simply thrown away in the deprecated API,
> the reuseable Token comes from the AttributeSource and is per-instance.
>
> Is this an idea?
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>

RE: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

> And I don't like the *useNewAPI*() methods either. I spent a lot of time 
> thinking about backwards compatibility for this API. It's tricky to do 
> without sacrificing performance. In API patches I find myself spending 
> more time for backwards-compatibility than for the actual new feature! :(
>
> I'll try to think about how to simplify this confusing old/new API mix.

One solution to fix this useNewAPI problem would be to change the
AttributeSource in a way, to return classes that implement interfaces (as
you proposed some weeks ago). The good old Token class then do not need to
be deprecated, it could simply implement all 4 interfaces. AttributeSource
then must implement a registry, which classes implement which interfaces. So
if somebody wants a TermAttribute, he always gets the Token. In all other
cases, the default could be a *Impl default class.

In this case, next() could simply return this Token (which is the all 4
attributes). The reuseableToken is simply thrown away in the deprecated API,
the reuseable Token comes from the AttributeSource and is per-instance.

Is this an idea?

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Robert Muir <rc...@gmail.com>.

yeah about 5 seconds in I saw that and decided to stick with what I know :)

On Mon, Jun 15, 2009 at 5:10 PM, Mark Miller<ma...@gmail.com> wrote:
> I may do the Highlighter. Its annoying though - I'll have to break back
> compat because Token is part of the public API (Fragmenter, etc).
>
> Robert Muir wrote:
>>
>> Michael OK, I plan on adding some tests for the analyzers that don't have
>> any.
>>
>> I didn't try to migrate things such as highlighter, which are
>> definitely just as important, only because I'm not familiar with that
>> territory.
>>
>> But I think I can figure out what the various language analyzers are
>> trying to do and add tests / convert the remaining ones.
>>
>> On Mon, Jun 15, 2009 at 4:42 PM, Michael Busch<bu...@gmail.com> wrote:
>>
>>>
>>> I agree. It's my fault, the task of changing the contribs (LUCENE-1460)
>>> is
>>> assigned to me for a while now - I just haven't found the time to do it
>>> yet.
>>>
>>> It's great that you started the work on that! I'll try to review the
>>> patch
>>> in the next couple of days and help with fixing the remaining ones. I'd
>>> like
>>> to get the AttributeSource improvements patch out first. I'll try that
>>> tonight.
>>>
>>>  Michael
>>>
>>> On 6/15/09 1:35 PM, Robert Muir wrote:
>>>
>>> Michael, again I am terrible with such things myself...
>>>
>>> Personally I am impressed that you have the back compat, even if you
>>> don't change any code at all I think some reformatting of javadocs
>>> might make the situation a lot friendlier. I just listed everything
>>> that came to my mind immediately.
>>>
>>> I guess I will also mention that one of the reasons I might not use
>>> the new API is that since all filters, etc on the same chain must use
>>> the same API, its discouraging if all the contrib stuff doesn't
>>> support the new API, it makes me want to just stick with the old so
>>> everything will work. So I think contribs being on the new API is
>>> really important otherwise no one will want to use it.
>>>
>>> On Mon, Jun 15, 2009 at 4:21 PM, Michael Busch<bu...@gmail.com> wrote:
>>>
>>>
>>> This is excellent feedback, Robert!
>>>
>>> I agree this is confusing; especially having a deprecated API and only a
>>> experimental one that replaces the old one. We need to change that.
>>> And I don't like the *useNewAPI*() methods either. I spent a lot of time
>>> thinking about backwards compatibility for this API. It's tricky to do
>>> without sacrificing performance. In API patches I find myself spending
>>> more
>>> time for backwards-compatibility than for the actual new feature! :(
>>>
>>> I'll try to think about how to simplify this confusing old/new API mix.
>>>
>>> However, we need to make the decisions:
>>> a) if we want to release this new API with 2.9,
>>> b) if yes to a), if we want to remove the old API in 3.0?
>>>
>>> If yes to a) and no to b), then we'll have to support both APIs for a
>>> presumably very long time, so we then need to have a better solution for
>>> the
>>> backwards-compatibility here.
>>>
>>> -Michael
>>>
>>> On 6/15/09 1:09 PM, Robert Muir wrote:
>>>
>>> let me try some slightly more constructive feedback:
>>>
>>> new user looks at TokenStream javadocs:
>>>
>>> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
>>> immediately they see deprecated, text in red with the words
>>> "experimental", warnings in bold, the whole thing is scary!
>>> due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
>>> in a bad way, and its probably the most important method to a new
>>> user!
>>> there's also a stray bold tag gone haywire somewhere, possibly
>>> .incrementToken()
>>>
>>> from a technical perspective, the documentation is excellent! but for
>>> a new user unfamiliar with lucene, its unclear exactly what steps to
>>> take: use the scary red experimental api or the old deprecated one?
>>>
>>> theres also some fairly advanced stuff such as .captureState and
>>> .restoreState that might be better in a different place.
>>>
>>> finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
>>> [one is static, one is not], especially because it states all streams
>>> and filters in one chain must use the same API, is there a way to
>>> simplify this?
>>>
>>> i'm really terrible with javadocs myself, but perhaps we can come up
>>> with a way to improve the presentation... maybe that will make the
>>> difference.
>>>
>>> On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com> wrote:
>>>
>>>
>>> Mark, I'll see if I can get tests produced for some of those analyzers.
>>>
>>> as a new user of the new api myself, I think I can safely say the most
>>> confusing thing about it is having the old deprecated API mixed in the
>>> javadocs with it :)
>>>
>>> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com>
>>> wrote:
>>>
>>>
>>> Robert Muir wrote:
>>>
>>>
>>> Mark, I created an issue for this.
>>>
>>>
>>>
>>> Thanks Robert, great idea.
>>>
>>>
>>> I just think you know, converting an analyzer to the new api is really
>>> not that bad.
>>>
>>>
>>>
>>> I don't either. I'm really just complaining about the initial
>>> readability.
>>> Once you know whats up, its not too much different. I just have found
>>> myself
>>> having to refigure out whats up (a short task to be sure) over again
>>> after I
>>> leave it for a while. With the old one, everything was just kind of
>>> immediately self evident.
>>>
>>> That makes me think new users might be a little more confused when they
>>> first meet again. I'm not a new user though, so its only a guess really.
>>>
>>>
>>> reverse engineering what one of them does is not necessarily obvious,
>>> and is completely unrelated but necessary if they are to be migrated.
>>>
>>> I'd be willing to assist with some of this but I don't want to really
>>> work the issue if its gonna be a waste of time at the end of the
>>> day...
>>>
>>>
>>>
>>> The chances of this issue being fully reverted are so remote that I
>>> really
>>> wouldnt let that stop you ...
>>>
>>>
>>> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> Robert Muir wrote:
>>>
>>>
>>>
>>> As Lucene's contrib hasn't been fully converted either (and its been
>>> quite
>>> some time now), someone has probably heard that groan before.
>>>
>>>
>>>
>>>
>>> hope this doesn't sound like a complaint,
>>>
>>>
>>>
>>> Complaints are fine in any case. Every now and then, it might cause a
>>> little
>>> rant from me or something, but please don't let that dissuade you :)
>>> Who doesnt like to rant and rave now and then. As long as thoughts and
>>> opinions are coming out in a non negative way (which certainly includes
>>> complaints),
>>> I think its all good.
>>>
>>>
>>>
>>>  but in my opinion this is
>>> because many do not have any tests.
>>> I converted a few of these and its just grunt work but if there are no
>>> tests, its impossible to verify the conversion is correct.
>>>
>>>
>>>
>>>
>>> Thanks for pointing that out. We probably get lazy with tests, especially
>>> in
>>> contrib, and this brings up a good point - we should probably push
>>> for tests or write them before committing more often. Sometimes I'm sure
>>> it
>>> just comes downto a tradeoff though - no resources at the time,
>>> the class looked clear cut, and it was just contrib anyway. But then here
>>> we
>>> are ... a healthy dose of grunt work is bad enough when you have tests to
>>> check it.
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

I may do the Highlighter. Its annoying though - I'll have to break back 
compat because Token is part of the public API (Fragmenter, etc).

Robert Muir wrote:
> Michael OK, I plan on adding some tests for the analyzers that don't have any.
>
> I didn't try to migrate things such as highlighter, which are
> definitely just as important, only because I'm not familiar with that
> territory.
>
> But I think I can figure out what the various language analyzers are
> trying to do and add tests / convert the remaining ones.
>
> On Mon, Jun 15, 2009 at 4:42 PM, Michael Busch<bu...@gmail.com> wrote:
>   
>> I agree. It's my fault, the task of changing the contribs (LUCENE-1460) is
>> assigned to me for a while now - I just haven't found the time to do it yet.
>>
>> It's great that you started the work on that! I'll try to review the patch
>> in the next couple of days and help with fixing the remaining ones. I'd like
>> to get the AttributeSource improvements patch out first. I'll try that
>> tonight.
>>
>>  Michael
>>
>> On 6/15/09 1:35 PM, Robert Muir wrote:
>>
>> Michael, again I am terrible with such things myself...
>>
>> Personally I am impressed that you have the back compat, even if you
>> don't change any code at all I think some reformatting of javadocs
>> might make the situation a lot friendlier. I just listed everything
>> that came to my mind immediately.
>>
>> I guess I will also mention that one of the reasons I might not use
>> the new API is that since all filters, etc on the same chain must use
>> the same API, its discouraging if all the contrib stuff doesn't
>> support the new API, it makes me want to just stick with the old so
>> everything will work. So I think contribs being on the new API is
>> really important otherwise no one will want to use it.
>>
>> On Mon, Jun 15, 2009 at 4:21 PM, Michael Busch<bu...@gmail.com> wrote:
>>
>>
>> This is excellent feedback, Robert!
>>
>> I agree this is confusing; especially having a deprecated API and only a
>> experimental one that replaces the old one. We need to change that.
>> And I don't like the *useNewAPI*() methods either. I spent a lot of time
>> thinking about backwards compatibility for this API. It's tricky to do
>> without sacrificing performance. In API patches I find myself spending more
>> time for backwards-compatibility than for the actual new feature! :(
>>
>> I'll try to think about how to simplify this confusing old/new API mix.
>>
>> However, we need to make the decisions:
>> a) if we want to release this new API with 2.9,
>> b) if yes to a), if we want to remove the old API in 3.0?
>>
>> If yes to a) and no to b), then we'll have to support both APIs for a
>> presumably very long time, so we then need to have a better solution for the
>> backwards-compatibility here.
>>
>> -Michael
>>
>> On 6/15/09 1:09 PM, Robert Muir wrote:
>>
>> let me try some slightly more constructive feedback:
>>
>> new user looks at TokenStream javadocs:
>> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
>> immediately they see deprecated, text in red with the words
>> "experimental", warnings in bold, the whole thing is scary!
>> due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
>> in a bad way, and its probably the most important method to a new
>> user!
>> there's also a stray bold tag gone haywire somewhere, possibly
>> .incrementToken()
>>
>> from a technical perspective, the documentation is excellent! but for
>> a new user unfamiliar with lucene, its unclear exactly what steps to
>> take: use the scary red experimental api or the old deprecated one?
>>
>> theres also some fairly advanced stuff such as .captureState and
>> .restoreState that might be better in a different place.
>>
>> finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
>> [one is static, one is not], especially because it states all streams
>> and filters in one chain must use the same API, is there a way to
>> simplify this?
>>
>> i'm really terrible with javadocs myself, but perhaps we can come up
>> with a way to improve the presentation... maybe that will make the
>> difference.
>>
>> On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com> wrote:
>>
>>
>> Mark, I'll see if I can get tests produced for some of those analyzers.
>>
>> as a new user of the new api myself, I think I can safely say the most
>> confusing thing about it is having the old deprecated API mixed in the
>> javadocs with it :)
>>
>> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com> wrote:
>>
>>
>> Robert Muir wrote:
>>
>>
>> Mark, I created an issue for this.
>>
>>
>>
>> Thanks Robert, great idea.
>>
>>
>> I just think you know, converting an analyzer to the new api is really
>> not that bad.
>>
>>
>>
>> I don't either. I'm really just complaining about the initial readability.
>> Once you know whats up, its not too much different. I just have found myself
>> having to refigure out whats up (a short task to be sure) over again after I
>> leave it for a while. With the old one, everything was just kind of
>> immediately self evident.
>>
>> That makes me think new users might be a little more confused when they
>> first meet again. I'm not a new user though, so its only a guess really.
>>
>>
>> reverse engineering what one of them does is not necessarily obvious,
>> and is completely unrelated but necessary if they are to be migrated.
>>
>> I'd be willing to assist with some of this but I don't want to really
>> work the issue if its gonna be a waste of time at the end of the
>> day...
>>
>>
>>
>> The chances of this issue being fully reverted are so remote that I really
>> wouldnt let that stop you ...
>>
>>
>> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
>>
>>
>>
>> Robert Muir wrote:
>>
>>
>>
>> As Lucene's contrib hasn't been fully converted either (and its been
>> quite
>> some time now), someone has probably heard that groan before.
>>
>>
>>
>>
>> hope this doesn't sound like a complaint,
>>
>>
>>
>> Complaints are fine in any case. Every now and then, it might cause a
>> little
>> rant from me or something, but please don't let that dissuade you :)
>> Who doesnt like to rant and rave now and then. As long as thoughts and
>> opinions are coming out in a non negative way (which certainly includes
>> complaints),
>> I think its all good.
>>
>>
>>
>>  but in my opinion this is
>> because many do not have any tests.
>> I converted a few of these and its just grunt work but if there are no
>> tests, its impossible to verify the conversion is correct.
>>
>>
>>
>>
>> Thanks for pointing that out. We probably get lazy with tests, especially
>> in
>> contrib, and this brings up a good point - we should probably push
>> for tests or write them before committing more often. Sometimes I'm sure
>> it
>> just comes downto a tradeoff though - no resources at the time,
>> the class looked clear cut, and it was just contrib anyway. But then here
>> we
>> are ... a healthy dose of grunt work is bad enough when you have tests to
>> check it.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>     
>
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Robert Muir <rc...@gmail.com>.

Michael OK, I plan on adding some tests for the analyzers that don't have any.

I didn't try to migrate things such as highlighter, which are
definitely just as important, only because I'm not familiar with that
territory.

But I think I can figure out what the various language analyzers are
trying to do and add tests / convert the remaining ones.

On Mon, Jun 15, 2009 at 4:42 PM, Michael Busch<bu...@gmail.com> wrote:
> I agree. It's my fault, the task of changing the contribs (LUCENE-1460) is
> assigned to me for a while now - I just haven't found the time to do it yet.
>
> It's great that you started the work on that! I'll try to review the patch
> in the next couple of days and help with fixing the remaining ones. I'd like
> to get the AttributeSource improvements patch out first. I'll try that
> tonight.
>
>  Michael
>
> On 6/15/09 1:35 PM, Robert Muir wrote:
>
> Michael, again I am terrible with such things myself...
>
> Personally I am impressed that you have the back compat, even if you
> don't change any code at all I think some reformatting of javadocs
> might make the situation a lot friendlier. I just listed everything
> that came to my mind immediately.
>
> I guess I will also mention that one of the reasons I might not use
> the new API is that since all filters, etc on the same chain must use
> the same API, its discouraging if all the contrib stuff doesn't
> support the new API, it makes me want to just stick with the old so
> everything will work. So I think contribs being on the new API is
> really important otherwise no one will want to use it.
>
> On Mon, Jun 15, 2009 at 4:21 PM, Michael Busch<bu...@gmail.com> wrote:
>
>
> This is excellent feedback, Robert!
>
> I agree this is confusing; especially having a deprecated API and only a
> experimental one that replaces the old one. We need to change that.
> And I don't like the *useNewAPI*() methods either. I spent a lot of time
> thinking about backwards compatibility for this API. It's tricky to do
> without sacrificing performance. In API patches I find myself spending more
> time for backwards-compatibility than for the actual new feature! :(
>
> I'll try to think about how to simplify this confusing old/new API mix.
>
> However, we need to make the decisions:
> a) if we want to release this new API with 2.9,
> b) if yes to a), if we want to remove the old API in 3.0?
>
> If yes to a) and no to b), then we'll have to support both APIs for a
> presumably very long time, so we then need to have a better solution for the
> backwards-compatibility here.
>
> -Michael
>
> On 6/15/09 1:09 PM, Robert Muir wrote:
>
> let me try some slightly more constructive feedback:
>
> new user looks at TokenStream javadocs:
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
> immediately they see deprecated, text in red with the words
> "experimental", warnings in bold, the whole thing is scary!
> due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
> in a bad way, and its probably the most important method to a new
> user!
> there's also a stray bold tag gone haywire somewhere, possibly
> .incrementToken()
>
> from a technical perspective, the documentation is excellent! but for
> a new user unfamiliar with lucene, its unclear exactly what steps to
> take: use the scary red experimental api or the old deprecated one?
>
> theres also some fairly advanced stuff such as .captureState and
> .restoreState that might be better in a different place.
>
> finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
> [one is static, one is not], especially because it states all streams
> and filters in one chain must use the same API, is there a way to
> simplify this?
>
> i'm really terrible with javadocs myself, but perhaps we can come up
> with a way to improve the presentation... maybe that will make the
> difference.
>
> On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com> wrote:
>
>
> Mark, I'll see if I can get tests produced for some of those analyzers.
>
> as a new user of the new api myself, I think I can safely say the most
> confusing thing about it is having the old deprecated API mixed in the
> javadocs with it :)
>
> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com> wrote:
>
>
> Robert Muir wrote:
>
>
> Mark, I created an issue for this.
>
>
>
> Thanks Robert, great idea.
>
>
> I just think you know, converting an analyzer to the new api is really
> not that bad.
>
>
>
> I don't either. I'm really just complaining about the initial readability.
> Once you know whats up, its not too much different. I just have found myself
> having to refigure out whats up (a short task to be sure) over again after I
> leave it for a while. With the old one, everything was just kind of
> immediately self evident.
>
> That makes me think new users might be a little more confused when they
> first meet again. I'm not a new user though, so its only a guess really.
>
>
> reverse engineering what one of them does is not necessarily obvious,
> and is completely unrelated but necessary if they are to be migrated.
>
> I'd be willing to assist with some of this but I don't want to really
> work the issue if its gonna be a waste of time at the end of the
> day...
>
>
>
> The chances of this issue being fully reverted are so remote that I really
> wouldnt let that stop you ...
>
>
> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
>
>
>
> Robert Muir wrote:
>
>
>
> As Lucene's contrib hasn't been fully converted either (and its been
> quite
> some time now), someone has probably heard that groan before.
>
>
>
>
> hope this doesn't sound like a complaint,
>
>
>
> Complaints are fine in any case. Every now and then, it might cause a
> little
> rant from me or something, but please don't let that dissuade you :)
> Who doesnt like to rant and rave now and then. As long as thoughts and
> opinions are coming out in a non negative way (which certainly includes
> complaints),
> I think its all good.
>
>
>
>  but in my opinion this is
> because many do not have any tests.
> I converted a few of these and its just grunt work but if there are no
> tests, its impossible to verify the conversion is correct.
>
>
>
>
> Thanks for pointing that out. We probably get lazy with tests, especially
> in
> contrib, and this brings up a good point - we should probably push
> for tests or write them before committing more often. Sometimes I'm sure
> it
> just comes downto a tradeoff though - no resources at the time,
> the class looked clear cut, and it was just contrib anyway. But then here
> we
> are ... a healthy dose of grunt work is bad enough when you have tests to
> check it.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>
>
>
>
>
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael Busch <bu...@gmail.com>.

I agree. It's my fault, the task of changing the contribs (LUCENE-1460) 
is assigned to me for a while now - I just haven't found the time to do 
it yet.

It's great that you started the work on that! I'll try to review the 
patch in the next couple of days and help with fixing the remaining 
ones. I'd like to get the AttributeSource improvements patch out first. 
I'll try that tonight.

  Michael

On 6/15/09 1:35 PM, Robert Muir wrote:
> Michael, again I am terrible with such things myself...
>
> Personally I am impressed that you have the back compat, even if you
> don't change any code at all I think some reformatting of javadocs
> might make the situation a lot friendlier. I just listed everything
> that came to my mind immediately.
>
> I guess I will also mention that one of the reasons I might not use
> the new API is that since all filters, etc on the same chain must use
> the same API, its discouraging if all the contrib stuff doesn't
> support the new API, it makes me want to just stick with the old so
> everything will work. So I think contribs being on the new API is
> really important otherwise no one will want to use it.
>
> On Mon, Jun 15, 2009 at 4:21 PM, Michael Busch<bu...@gmail.com>  wrote:
>    
>> This is excellent feedback, Robert!
>>
>> I agree this is confusing; especially having a deprecated API and only a
>> experimental one that replaces the old one. We need to change that.
>> And I don't like the *useNewAPI*() methods either. I spent a lot of time
>> thinking about backwards compatibility for this API. It's tricky to do
>> without sacrificing performance. In API patches I find myself spending more
>> time for backwards-compatibility than for the actual new feature! :(
>>
>> I'll try to think about how to simplify this confusing old/new API mix.
>>
>> However, we need to make the decisions:
>> a) if we want to release this new API with 2.9,
>> b) if yes to a), if we want to remove the old API in 3.0?
>>
>> If yes to a) and no to b), then we'll have to support both APIs for a
>> presumably very long time, so we then need to have a better solution for the
>> backwards-compatibility here.
>>
>> -Michael
>>
>> On 6/15/09 1:09 PM, Robert Muir wrote:
>>
>> let me try some slightly more constructive feedback:
>>
>> new user looks at TokenStream javadocs:
>> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
>> immediately they see deprecated, text in red with the words
>> "experimental", warnings in bold, the whole thing is scary!
>> due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
>> in a bad way, and its probably the most important method to a new
>> user!
>> there's also a stray bold tag gone haywire somewhere, possibly
>> .incrementToken()
>>
>> from a technical perspective, the documentation is excellent! but for
>> a new user unfamiliar with lucene, its unclear exactly what steps to
>> take: use the scary red experimental api or the old deprecated one?
>>
>> theres also some fairly advanced stuff such as .captureState and
>> .restoreState that might be better in a different place.
>>
>> finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
>> [one is static, one is not], especially because it states all streams
>> and filters in one chain must use the same API, is there a way to
>> simplify this?
>>
>> i'm really terrible with javadocs myself, but perhaps we can come up
>> with a way to improve the presentation... maybe that will make the
>> difference.
>>
>> On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com>  wrote:
>>
>>
>> Mark, I'll see if I can get tests produced for some of those analyzers.
>>
>> as a new user of the new api myself, I think I can safely say the most
>> confusing thing about it is having the old deprecated API mixed in the
>> javadocs with it :)
>>
>> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com>  wrote:
>>
>>
>> Robert Muir wrote:
>>
>>
>> Mark, I created an issue for this.
>>
>>
>>
>> Thanks Robert, great idea.
>>
>>
>> I just think you know, converting an analyzer to the new api is really
>> not that bad.
>>
>>
>>
>> I don't either. I'm really just complaining about the initial readability.
>> Once you know whats up, its not too much different. I just have found myself
>> having to refigure out whats up (a short task to be sure) over again after I
>> leave it for a while. With the old one, everything was just kind of
>> immediately self evident.
>>
>> That makes me think new users might be a little more confused when they
>> first meet again. I'm not a new user though, so its only a guess really.
>>
>>
>> reverse engineering what one of them does is not necessarily obvious,
>> and is completely unrelated but necessary if they are to be migrated.
>>
>> I'd be willing to assist with some of this but I don't want to really
>> work the issue if its gonna be a waste of time at the end of the
>> day...
>>
>>
>>
>> The chances of this issue being fully reverted are so remote that I really
>> wouldnt let that stop you ...
>>
>>
>> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com>  wrote:
>>
>>
>>
>> Robert Muir wrote:
>>
>>
>>
>> As Lucene's contrib hasn't been fully converted either (and its been
>> quite
>> some time now), someone has probably heard that groan before.
>>
>>
>>
>>
>> hope this doesn't sound like a complaint,
>>
>>
>>
>> Complaints are fine in any case. Every now and then, it might cause a
>> little
>> rant from me or something, but please don't let that dissuade you :)
>> Who doesnt like to rant and rave now and then. As long as thoughts and
>> opinions are coming out in a non negative way (which certainly includes
>> complaints),
>> I think its all good.
>>
>>
>>
>>   but in my opinion this is
>> because many do not have any tests.
>> I converted a few of these and its just grunt work but if there are no
>> tests, its impossible to verify the conversion is correct.
>>
>>
>>
>>
>> Thanks for pointing that out. We probably get lazy with tests, especially
>> in
>> contrib, and this brings up a good point - we should probably push
>> for tests or write them before committing more often. Sometimes I'm sure
>> it
>> just comes downto a tradeoff though - no resources at the time,
>> the class looked clear cut, and it was just contrib anyway. But then here
>> we
>> are ... a healthy dose of grunt work is bad enough when you have tests to
>> check it.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>>
>>
>>
>>
>>      
>
>
>
>

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Robert Muir <rc...@gmail.com>.

Michael, again I am terrible with such things myself...

Personally I am impressed that you have the back compat, even if you
don't change any code at all I think some reformatting of javadocs
might make the situation a lot friendlier. I just listed everything
that came to my mind immediately.

I guess I will also mention that one of the reasons I might not use
the new API is that since all filters, etc on the same chain must use
the same API, its discouraging if all the contrib stuff doesn't
support the new API, it makes me want to just stick with the old so
everything will work. So I think contribs being on the new API is
really important otherwise no one will want to use it.

On Mon, Jun 15, 2009 at 4:21 PM, Michael Busch<bu...@gmail.com> wrote:
> This is excellent feedback, Robert!
>
> I agree this is confusing; especially having a deprecated API and only a
> experimental one that replaces the old one. We need to change that.
> And I don't like the *useNewAPI*() methods either. I spent a lot of time
> thinking about backwards compatibility for this API. It's tricky to do
> without sacrificing performance. In API patches I find myself spending more
> time for backwards-compatibility than for the actual new feature! :(
>
> I'll try to think about how to simplify this confusing old/new API mix.
>
> However, we need to make the decisions:
> a) if we want to release this new API with 2.9,
> b) if yes to a), if we want to remove the old API in 3.0?
>
> If yes to a) and no to b), then we'll have to support both APIs for a
> presumably very long time, so we then need to have a better solution for the
> backwards-compatibility here.
>
> -Michael
>
> On 6/15/09 1:09 PM, Robert Muir wrote:
>
> let me try some slightly more constructive feedback:
>
> new user looks at TokenStream javadocs:
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
> immediately they see deprecated, text in red with the words
> "experimental", warnings in bold, the whole thing is scary!
> due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
> in a bad way, and its probably the most important method to a new
> user!
> there's also a stray bold tag gone haywire somewhere, possibly
> .incrementToken()
>
> from a technical perspective, the documentation is excellent! but for
> a new user unfamiliar with lucene, its unclear exactly what steps to
> take: use the scary red experimental api or the old deprecated one?
>
> theres also some fairly advanced stuff such as .captureState and
> .restoreState that might be better in a different place.
>
> finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
> [one is static, one is not], especially because it states all streams
> and filters in one chain must use the same API, is there a way to
> simplify this?
>
> i'm really terrible with javadocs myself, but perhaps we can come up
> with a way to improve the presentation... maybe that will make the
> difference.
>
> On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com> wrote:
>
>
> Mark, I'll see if I can get tests produced for some of those analyzers.
>
> as a new user of the new api myself, I think I can safely say the most
> confusing thing about it is having the old deprecated API mixed in the
> javadocs with it :)
>
> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com> wrote:
>
>
> Robert Muir wrote:
>
>
> Mark, I created an issue for this.
>
>
>
> Thanks Robert, great idea.
>
>
> I just think you know, converting an analyzer to the new api is really
> not that bad.
>
>
>
> I don't either. I'm really just complaining about the initial readability.
> Once you know whats up, its not too much different. I just have found myself
> having to refigure out whats up (a short task to be sure) over again after I
> leave it for a while. With the old one, everything was just kind of
> immediately self evident.
>
> That makes me think new users might be a little more confused when they
> first meet again. I'm not a new user though, so its only a guess really.
>
>
> reverse engineering what one of them does is not necessarily obvious,
> and is completely unrelated but necessary if they are to be migrated.
>
> I'd be willing to assist with some of this but I don't want to really
> work the issue if its gonna be a waste of time at the end of the
> day...
>
>
>
> The chances of this issue being fully reverted are so remote that I really
> wouldnt let that stop you ...
>
>
> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
>
>
>
> Robert Muir wrote:
>
>
>
> As Lucene's contrib hasn't been fully converted either (and its been
> quite
> some time now), someone has probably heard that groan before.
>
>
>
>
> hope this doesn't sound like a complaint,
>
>
>
> Complaints are fine in any case. Every now and then, it might cause a
> little
> rant from me or something, but please don't let that dissuade you :)
> Who doesnt like to rant and rave now and then. As long as thoughts and
> opinions are coming out in a non negative way (which certainly includes
> complaints),
> I think its all good.
>
>
>
>  but in my opinion this is
> because many do not have any tests.
> I converted a few of these and its just grunt work but if there are no
> tests, its impossible to verify the conversion is correct.
>
>
>
>
> Thanks for pointing that out. We probably get lazy with tests, especially
> in
> contrib, and this brings up a good point - we should probably push
> for tests or write them before committing more often. Sometimes I'm sure
> it
> just comes downto a tradeoff though - no resources at the time,
> the class looked clear cut, and it was just contrib anyway. But then here
> we
> are ... a healthy dose of grunt work is bad enough when you have tests to
> check it.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael Busch <bu...@gmail.com>.

This is excellent feedback, Robert!

I agree this is confusing; especially having a deprecated API and only a 
experimental one that replaces the old one. We need to change that.
And I don't like the *useNewAPI*() methods either. I spent a lot of time 
thinking about backwards compatibility for this API. It's tricky to do 
without sacrificing performance. In API patches I find myself spending 
more time for backwards-compatibility than for the actual new feature! :(

I'll try to think about how to simplify this confusing old/new API mix.

However, we need to make the decisions:
a) if we want to release this new API with 2.9,
b) if yes to a), if we want to remove the old API in 3.0?

If yes to a) and no to b), then we'll have to support both APIs for a 
presumably very long time, so we then need to have a better solution for 
the backwards-compatibility here.

-Michael

On 6/15/09 1:09 PM, Robert Muir wrote:
> let me try some slightly more constructive feedback:
>
> new user looks at TokenStream javadocs:
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
> immediately they see deprecated, text in red with the words
> "experimental", warnings in bold, the whole thing is scary!
> due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
> in a bad way, and its probably the most important method to a new
> user!
> there's also a stray bold tag gone haywire somewhere, possibly .incrementToken()
>
> from a technical perspective, the documentation is excellent! but for
> a new user unfamiliar with lucene, its unclear exactly what steps to
> take: use the scary red experimental api or the old deprecated one?
>
> theres also some fairly advanced stuff such as .captureState and
> .restoreState that might be better in a different place.
>
> finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
> [one is static, one is not], especially because it states all streams
> and filters in one chain must use the same API, is there a way to
> simplify this?
>
> i'm really terrible with javadocs myself, but perhaps we can come up
> with a way to improve the presentation... maybe that will make the
> difference.
>
> On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com>  wrote:
>    
>> Mark, I'll see if I can get tests produced for some of those analyzers.
>>
>> as a new user of the new api myself, I think I can safely say the most
>> confusing thing about it is having the old deprecated API mixed in the
>> javadocs with it :)
>>
>> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com>  wrote:
>>      
>>> Robert Muir wrote:
>>>        
>>>> Mark, I created an issue for this.
>>>>
>>>>          
>>> Thanks Robert, great idea.
>>>        
>>>> I just think you know, converting an analyzer to the new api is really
>>>> not that bad.
>>>>
>>>>          
>>> I don't either. I'm really just complaining about the initial readability.
>>> Once you know whats up, its not too much different. I just have found myself
>>> having to refigure out whats up (a short task to be sure) over again after I
>>> leave it for a while. With the old one, everything was just kind of
>>> immediately self evident.
>>>
>>> That makes me think new users might be a little more confused when they
>>> first meet again. I'm not a new user though, so its only a guess really.
>>>        
>>>> reverse engineering what one of them does is not necessarily obvious,
>>>> and is completely unrelated but necessary if they are to be migrated.
>>>>
>>>> I'd be willing to assist with some of this but I don't want to really
>>>> work the issue if its gonna be a waste of time at the end of the
>>>> day...
>>>>
>>>>          
>>> The chances of this issue being fully reverted are so remote that I really
>>> wouldnt let that stop you ...
>>>        
>>>> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com>  wrote:
>>>>
>>>>          
>>>>> Robert Muir wrote:
>>>>>
>>>>>            
>>>>>>> As Lucene's contrib hasn't been fully converted either (and its been
>>>>>>> quite
>>>>>>> some time now), someone has probably heard that groan before.
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> hope this doesn't sound like a complaint,
>>>>>>
>>>>>>              
>>>>> Complaints are fine in any case. Every now and then, it might cause a
>>>>> little
>>>>> rant from me or something, but please don't let that dissuade you :)
>>>>> Who doesnt like to rant and rave now and then. As long as thoughts and
>>>>> opinions are coming out in a non negative way (which certainly includes
>>>>> complaints),
>>>>> I think its all good.
>>>>>
>>>>>            
>>>>>>   but in my opinion this is
>>>>>> because many do not have any tests.
>>>>>> I converted a few of these and its just grunt work but if there are no
>>>>>> tests, its impossible to verify the conversion is correct.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Thanks for pointing that out. We probably get lazy with tests, especially
>>>>> in
>>>>> contrib, and this brings up a good point - we should probably push
>>>>> for tests or write them before committing more often. Sometimes I'm sure
>>>>> it
>>>>> just comes downto a tradeoff though - no resources at the time,
>>>>> the class looked clear cut, and it was just contrib anyway. But then here
>>>>> we
>>>>> are ... a healthy dose of grunt work is bad enough when you have tests to
>>>>> check it.
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>
>>>>
>>>>          
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>        
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>>      
>
>
>
>

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Robert Muir <rc...@gmail.com>.

let me try some slightly more constructive feedback:

new user looks at TokenStream javadocs:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
immediately they see deprecated, text in red with the words
"experimental", warnings in bold, the whole thing is scary!
due to the use of 'e.g.' the javadoc for .incrementToken() is cut off
in a bad way, and its probably the most important method to a new
user!
there's also a stray bold tag gone haywire somewhere, possibly .incrementToken()

from a technical perspective, the documentation is excellent! but for
a new user unfamiliar with lucene, its unclear exactly what steps to
take: use the scary red experimental api or the old deprecated one?

theres also some fairly advanced stuff such as .captureState and
.restoreState that might be better in a different place.

finally, the .setUseNewAPI() and .setUseNewAPIDefault() are confusing
[one is static, one is not], especially because it states all streams
and filters in one chain must use the same API, is there a way to
simplify this?

i'm really terrible with javadocs myself, but perhaps we can come up
with a way to improve the presentation... maybe that will make the
difference.

On Mon, Jun 15, 2009 at 3:45 PM, Robert Muir<rc...@gmail.com> wrote:
> Mark, I'll see if I can get tests produced for some of those analyzers.
>
> as a new user of the new api myself, I think I can safely say the most
> confusing thing about it is having the old deprecated API mixed in the
> javadocs with it :)
>
> On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com> wrote:
>> Robert Muir wrote:
>>>
>>> Mark, I created an issue for this.
>>>
>>
>> Thanks Robert, great idea.
>>>
>>> I just think you know, converting an analyzer to the new api is really
>>> not that bad.
>>>
>>
>> I don't either. I'm really just complaining about the initial readability.
>> Once you know whats up, its not too much different. I just have found myself
>> having to refigure out whats up (a short task to be sure) over again after I
>> leave it for a while. With the old one, everything was just kind of
>> immediately self evident.
>>
>> That makes me think new users might be a little more confused when they
>> first meet again. I'm not a new user though, so its only a guess really.
>>>
>>> reverse engineering what one of them does is not necessarily obvious,
>>> and is completely unrelated but necessary if they are to be migrated.
>>>
>>> I'd be willing to assist with some of this but I don't want to really
>>> work the issue if its gonna be a waste of time at the end of the
>>> day...
>>>
>>
>> The chances of this issue being fully reverted are so remote that I really
>> wouldnt let that stop you ...
>>>
>>> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
>>>
>>>>
>>>> Robert Muir wrote:
>>>>
>>>>>>
>>>>>> As Lucene's contrib hasn't been fully converted either (and its been
>>>>>> quite
>>>>>> some time now), someone has probably heard that groan before.
>>>>>>
>>>>>>
>>>>>
>>>>> hope this doesn't sound like a complaint,
>>>>>
>>>>
>>>> Complaints are fine in any case. Every now and then, it might cause a
>>>> little
>>>> rant from me or something, but please don't let that dissuade you :)
>>>> Who doesnt like to rant and rave now and then. As long as thoughts and
>>>> opinions are coming out in a non negative way (which certainly includes
>>>> complaints),
>>>> I think its all good.
>>>>
>>>>>
>>>>>  but in my opinion this is
>>>>> because many do not have any tests.
>>>>> I converted a few of these and its just grunt work but if there are no
>>>>> tests, its impossible to verify the conversion is correct.
>>>>>
>>>>>
>>>>
>>>> Thanks for pointing that out. We probably get lazy with tests, especially
>>>> in
>>>> contrib, and this brings up a good point - we should probably push
>>>> for tests or write them before committing more often. Sometimes I'm sure
>>>> it
>>>> just comes downto a tradeoff though - no resources at the time,
>>>> the class looked clear cut, and it was just contrib anyway. But then here
>>>> we
>>>> are ... a healthy dose of grunt work is bad enough when you have tests to
>>>> check it.
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Robert Muir <rc...@gmail.com>.

Mark, I'll see if I can get tests produced for some of those analyzers.

as a new user of the new api myself, I think I can safely say the most
confusing thing about it is having the old deprecated API mixed in the
javadocs with it :)

On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller<ma...@gmail.com> wrote:
> Robert Muir wrote:
>>
>> Mark, I created an issue for this.
>>
>
> Thanks Robert, great idea.
>>
>> I just think you know, converting an analyzer to the new api is really
>> not that bad.
>>
>
> I don't either. I'm really just complaining about the initial readability.
> Once you know whats up, its not too much different. I just have found myself
> having to refigure out whats up (a short task to be sure) over again after I
> leave it for a while. With the old one, everything was just kind of
> immediately self evident.
>
> That makes me think new users might be a little more confused when they
> first meet again. I'm not a new user though, so its only a guess really.
>>
>> reverse engineering what one of them does is not necessarily obvious,
>> and is completely unrelated but necessary if they are to be migrated.
>>
>> I'd be willing to assist with some of this but I don't want to really
>> work the issue if its gonna be a waste of time at the end of the
>> day...
>>
>
> The chances of this issue being fully reverted are so remote that I really
> wouldnt let that stop you ...
>>
>> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
>>
>>>
>>> Robert Muir wrote:
>>>
>>>>>
>>>>> As Lucene's contrib hasn't been fully converted either (and its been
>>>>> quite
>>>>> some time now), someone has probably heard that groan before.
>>>>>
>>>>>
>>>>
>>>> hope this doesn't sound like a complaint,
>>>>
>>>
>>> Complaints are fine in any case. Every now and then, it might cause a
>>> little
>>> rant from me or something, but please don't let that dissuade you :)
>>> Who doesnt like to rant and rave now and then. As long as thoughts and
>>> opinions are coming out in a non negative way (which certainly includes
>>> complaints),
>>> I think its all good.
>>>
>>>>
>>>>  but in my opinion this is
>>>> because many do not have any tests.
>>>> I converted a few of these and its just grunt work but if there are no
>>>> tests, its impossible to verify the conversion is correct.
>>>>
>>>>
>>>
>>> Thanks for pointing that out. We probably get lazy with tests, especially
>>> in
>>> contrib, and this brings up a good point - we should probably push
>>> for tests or write them before committing more often. Sometimes I'm sure
>>> it
>>> just comes downto a tradeoff though - no resources at the time,
>>> the class looked clear cut, and it was just contrib anyway. But then here
>>> we
>>> are ... a healthy dose of grunt work is bad enough when you have tests to
>>> check it.
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Robert Muir wrote:
> Mark, I created an issue for this.
>   
Thanks Robert, great idea.
> I just think you know, converting an analyzer to the new api is really
> not that bad.
>   
I don't either. I'm really just complaining about the initial 
readability. Once you know whats up, its not too much different. I just 
have found myself
having to refigure out whats up (a short task to be sure) over again 
after I leave it for a while. With the old one, everything was just kind 
of immediately self evident.

That makes me think new users might be a little more confused when they 
first meet again. I'm not a new user though, so its only a guess really.
> reverse engineering what one of them does is not necessarily obvious,
> and is completely unrelated but necessary if they are to be migrated.
>
> I'd be willing to assist with some of this but I don't want to really
> work the issue if its gonna be a waste of time at the end of the
> day...
>   
The chances of this issue being fully reverted are so remote that I 
really wouldnt let that stop you ...
> On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
>   
>> Robert Muir wrote:
>>     
>>>> As Lucene's contrib hasn't been fully converted either (and its been
>>>> quite
>>>> some time now), someone has probably heard that groan before.
>>>>
>>>>         
>>> hope this doesn't sound like a complaint,
>>>       
>> Complaints are fine in any case. Every now and then, it might cause a little
>> rant from me or something, but please don't let that dissuade you :)
>> Who doesnt like to rant and rave now and then. As long as thoughts and
>> opinions are coming out in a non negative way (which certainly includes
>> complaints),
>> I think its all good.
>>     
>>>  but in my opinion this is
>>> because many do not have any tests.
>>> I converted a few of these and its just grunt work but if there are no
>>> tests, its impossible to verify the conversion is correct.
>>>
>>>       
>> Thanks for pointing that out. We probably get lazy with tests, especially in
>> contrib, and this brings up a good point - we should probably push
>> for tests or write them before committing more often. Sometimes I'm sure it
>> just comes downto a tradeoff though - no resources at the time,
>> the class looked clear cut, and it was just contrib anyway. But then here we
>> are ... a healthy dose of grunt work is bad enough when you have tests to
>> check it.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>     
>
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Robert Muir <rc...@gmail.com>.

Mark, I created an issue for this.

I just think you know, converting an analyzer to the new api is really
not that bad.

reverse engineering what one of them does is not necessarily obvious,
and is completely unrelated but necessary if they are to be migrated.

I'd be willing to assist with some of this but I don't want to really
work the issue if its gonna be a waste of time at the end of the
day...

On Mon, Jun 15, 2009 at 1:55 PM, Mark Miller<ma...@gmail.com> wrote:
> Robert Muir wrote:
>>>
>>> As Lucene's contrib hasn't been fully converted either (and its been
>>> quite
>>> some time now), someone has probably heard that groan before.
>>>
>>
>> hope this doesn't sound like a complaint,
>
> Complaints are fine in any case. Every now and then, it might cause a little
> rant from me or something, but please don't let that dissuade you :)
> Who doesnt like to rant and rave now and then. As long as thoughts and
> opinions are coming out in a non negative way (which certainly includes
> complaints),
> I think its all good.
>>
>>  but in my opinion this is
>> because many do not have any tests.
>> I converted a few of these and its just grunt work but if there are no
>> tests, its impossible to verify the conversion is correct.
>>
>
> Thanks for pointing that out. We probably get lazy with tests, especially in
> contrib, and this brings up a good point - we should probably push
> for tests or write them before committing more often. Sometimes I'm sure it
> just comes downto a tradeoff though - no resources at the time,
> the class looked clear cut, and it was just contrib anyway. But then here we
> are ... a healthy dose of grunt work is bad enough when you have tests to
> check it.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Robert Muir wrote:
>> As Lucene's contrib hasn't been fully converted either (and its been quite
>> some time now), someone has probably heard that groan before.
>>     
>
> hope this doesn't sound like a complaint,
Complaints are fine in any case. Every now and then, it might cause a 
little rant from me or something, but please don't let that dissuade you :)
Who doesnt like to rant and rave now and then. As long as thoughts and 
opinions are coming out in a non negative way (which certainly includes 
complaints),
I think its all good.
>  but in my opinion this is
> because many do not have any tests.
> I converted a few of these and its just grunt work but if there are no
> tests, its impossible to verify the conversion is correct.
>   
Thanks for pointing that out. We probably get lazy with tests, 
especially in contrib, and this brings up a good point - we should 
probably push
for tests or write them before committing more often. Sometimes I'm sure 
it just comes downto a tradeoff though - no resources at the time,
the class looked clear cut, and it was just contrib anyway. But then 
here we are ... a healthy dose of grunt work is bad enough when you have 
tests to check it.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Robert Muir <rc...@gmail.com>.

>
> As Lucene's contrib hasn't been fully converted either (and its been quite
> some time now), someone has probably heard that groan before.

hope this doesn't sound like a complaint, but in my opinion this is
because many do not have any tests.
I converted a few of these and its just grunt work but if there are no
tests, its impossible to verify the conversion is correct.

-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Yonik Seeley wrote:
> The high-level description of the new API looks good (being able to
> add arbitrary properties to tokens), unfortunately, I've never had the
> time to try and use it and give any constructive feedback.
>
> As far as difficulty of use, I assume this only applies to
> implementing your own TokenFilter? It seems like most standard users
> would be just stringing together existing TokenFilters to create
> custom Analyzers?
>
> -Yonik
> http://www.lucidimagination.com
>
>   
True - its the implementation. And just trying to understand whats going 
on the first time you see it.

Its not particularly difficult, but its also not obvious like the 
previous API was. As a user, I would ask why that is so, and frankly the 
answer wouldn't do much for me (as a user).

I don't know if most 'standard' users implement their own or not. I will 
say, and perhaps I was in a special situation, I was writing them and 
modifying them almost as soon
as I started playing with Lucene. And even when I wasnt, I needed to 
understand the code to understand some of the complexities that could 
occur, and thankfully, that was breezy to do.

Right now, if you told me to go convert all of Solr to the new API you 
would hear a mighty groan.

As Lucene's contrib hasn't been fully converted either (and its been 
quite some time now), someone has probably heard that groan before.

-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Yonik Seeley <yo...@lucidimagination.com>.

The high-level description of the new API looks good (being able to
add arbitrary properties to tokens), unfortunately, I've never had the
time to try and use it and give any constructive feedback.

As far as difficulty of use, I assume this only applies to
implementing your own TokenFilter? It seems like most standard users
would be just stringing together existing TokenFilters to create
custom Analyzers?

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Grant Ingersoll <gs...@apache.org>.

On Jun 15, 2009, at 12:19 PM, Michael McCandless wrote:

> I don't think anything was "held back" in this effort. Grant, are you
> referring to LUCENE-1458?  That's "held back" simply because the only
> person working on it (me) got distracted by other things to work on.

I'm sorry, I didn't mean to imply Michael B. was holding back on the  
work.  The patch has always felt half done to me because what's the  
point of having all of these attributes in the index if you don't have  
anyway of searching them, thus I was struck by the need to get it in  
prior to making it available in search.    I realize it's complex, but  
here we are forcing people to upgrade for some future, long term goal.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael McCandless <lu...@mikemccandless.com>.

I thought the primary goal of switching to AttributeSource (yes, the
name is very generic...) was to allow extensibility to what's created
per-Token, so that an app could add their own attrs without costly
subclassing/casting per Token, independent of other other "things"
adding their tokens, etc.

EG, trie* takes advantage of this extensibility by adding a
ShiftAttribute.

Subclassing Token in your app wasn't a good solution for various
reasons.

I do think the API is somewhat more cumbersome than before, and I
don't like that about it (consumability!).

But net/net I think the change is good, and it's one of the
baby steps for flexible indexing (bullet #11):

  http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard

Ie it addresses the flexibility during analysis.

I don't think anything was "held back" in this effort. Grant, are you
referring to LUCENE-1458?  That's "held back" simply because the only
person working on it (me) got distracted by other things to work on.

Flexible indexing (all of bullet #11) is a complex project, and we
need to break it into baby steps like this one.  We've already made
good progress on it: you can already make custom attrs and a custom
(but, package private) indexing chain if you want.

Next step is pluggable codecs for writing index files (LUCENE-1458),
and APIs for reading them (that generalize Terms/TermDoc/TermPositions
we have today).

Mike

On Sun, Jun 14, 2009 at 11:41 PM, Shai Erera<se...@gmail.com> wrote:
> The "old" API is deprecated, and therefore when we release 2.9 there might
> be some people who'd think they should move away from it, to better prepare
> for 3.0 (while in fact this many not be the case). Also, we should make sure
> that when we remove all the deprecations, this will still exist (and
> therefore, why deprecate it now?), if we think this should indeed be kept
> around for at least a while longer.
>
> I personally am all for keeping it around (it will save me a huge
> refactoring of an Analyzer package I wrote), but I have to admit it's only
> because I've got quite comfortable with the existing API, and did not have
> the time to try the new one yet.
>
> Shai
>
> On Mon, Jun 15, 2009 at 3:49 AM, Mark Miller <ma...@gmail.com> wrote:
>>
>> Mark Miller wrote:
>>>
>>> I don't know how I feel about rolling the new token api back.
>>>
>>> I will say that I originally had no issue with it because I am very
>>> excited about Lucene-1458.
>>>
>>> At the same time though, I'm thinking Lucene-1458 is a very advanced
>>> issue that will likely be for really expert usage (though I can see benefits
>>> falling to general users).
>>>
>>> I'm slightly iffy about making an intuitive api much less intuitive for
>>> an expert future feature that hasn't fully materialized in Lucene yet. It
>>> almost seems like that fight should weigh towards general usage and standard
>>> users.
>>>
>>> I don't have a better proposal though, nor the time to consider it at the
>>> moment. I was just more curious if anyone else had any thoughts. I hadn't
>>> realized Grant had asked a similar question not long ago
>>> with no response. Not sure how to take that, but I'd think that would
>>> indicate less problems with people than more. On the other hand, you don't
>>> have to switch yet (with trunk) and we have yet to release it. I wonder how
>>> many non dev, every day users have really had to tussle with the new API
>>> yet. Not many people complaining too loudly at the moment though.
>>>
>>> Asking for a roll back seems a bit extreme without a little more support
>>> behind it than we have seen.
>>>
>>> - Mark
>>
>> PS
>>
>> I know you didnt ask for a rollback Grant - just kind of talking in a
>> general manner. I see your point on getting the search side in, I'm just not
>> sure I agree that it really matters if one hits before the other. Like Mike
>> says, you don't
>> have to switch to the new API yet.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Shai Erera <se...@gmail.com>.

The "old" API is deprecated, and therefore when we release 2.9 there might
be some people who'd think they should move away from it, to better prepare
for 3.0 (while in fact this many not be the case). Also, we should make sure
that when we remove all the deprecations, this will still exist (and
therefore, why deprecate it now?), if we think this should indeed be kept
around for at least a while longer.

I personally am all for keeping it around (it will save me a huge
refactoring of an Analyzer package I wrote), but I have to admit it's only
because I've got quite comfortable with the existing API, and did not have
the time to try the new one yet.

Shai

On Mon, Jun 15, 2009 at 3:49 AM, Mark Miller <ma...@gmail.com> wrote:

> Mark Miller wrote:
>
>> I don't know how I feel about rolling the new token api back.
>>
>> I will say that I originally had no issue with it because I am very
>> excited about Lucene-1458.
>>
>> At the same time though, I'm thinking Lucene-1458 is a very advanced issue
>> that will likely be for really expert usage (though I can see benefits
>> falling to general users).
>>
>> I'm slightly iffy about making an intuitive api much less intuitive for an
>> expert future feature that hasn't fully materialized in Lucene yet. It
>> almost seems like that fight should weigh towards general usage and standard
>> users.
>>
>> I don't have a better proposal though, nor the time to consider it at the
>> moment. I was just more curious if anyone else had any thoughts. I hadn't
>> realized Grant had asked a similar question not long ago
>> with no response. Not sure how to take that, but I'd think that would
>> indicate less problems with people than more. On the other hand, you don't
>> have to switch yet (with trunk) and we have yet to release it. I wonder how
>> many non dev, every day users have really had to tussle with the new API
>> yet. Not many people complaining too loudly at the moment though.
>>
>> Asking for a roll back seems a bit extreme without a little more support
>> behind it than we have seen.
>>
>> - Mark
>>
>
> PS
>
> I know you didnt ask for a rollback Grant - just kind of talking in a
> general manner. I see your point on getting the search side in, I'm just not
> sure I agree that it really matters if one hits before the other. Like Mike
> says, you don't
> have to switch to the new API yet.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

RE: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

> On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> > There is a new Attribute called ShiftAttribute (or
> NumericShiftAttribute),
> > when trie range is moved to core. This attribute contains the shifted-
> away
> > bits from the prefix encoded value during trie indexing.
> 
> I was wondering about this....
> To make use of ShiftAttribute, you need to understand the trie
> encoding scheme itself.  If you understood that, you'd be able to look
> at the actual token value if you were interested in what shift was
> used.  So it's redundant, has a runtime cost, it's not currently used
> anywhere, and it's not useful to fields other than Trie.  Perhaps it
> shouldn't exist (yet)?

The idea was to make the indexing process controllable. You were the one,
who asked e.g. for the possibility to add payloads to trie fields and so on.
Using the shift attribute, you have full control of the token types. OK,
it's a little bit redundant; you could also use the TypeAttribute (which is
already used to mark highest precision and lower precision values).

One question about the whole TokenStream: In the original case we discussed
about Payloads/Position and TrieRange. If this would be implemented in
future versions, the question is, how should I set the
PositionIncrement/Offsets in the token stream to create a Position of 0 in
the index. I do not understand the indexing process here, especially this
deprecated boolean flag about something negative (not sure what the name
was). Should I set PositionIncrement to 0 for all Trie fields per default.
How about PositionIncrementGap, when indexing more than one field? All not
really clear. The position would be simplier to implement, but doing this
with an attribute, that is indexes together with the other attributes like a
payload would be the most ideal solution for future versions of TrieRange.

(Maybe we could also use the Offset attribute for the highest precision
bits)

Uwe

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

> If you understood that, you'd be able to look
> at the actual token value if you were interested in what shift was
> used.  So it's redundant, has a runtime cost, it's not currently used
> anywhere, and it's not useful to fields other than Trie.  Perhaps it
> shouldn't exist (yet)?

You are right, you could also decode the shift value from the first char of
the token... I think, I will remove the ShiftAttribute and only set the
TermType to highest, lower precisions. By this, one could easily add a
payload to the real numeric value using a TokenFilter.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> There is a new Attribute called ShiftAttribute (or NumericShiftAttribute),
> when trie range is moved to core. This attribute contains the shifted-away
> bits from the prefix encoded value during trie indexing.

I was wondering about this....
To make use of ShiftAttribute, you need to understand the trie
encoding scheme itself.  If you understood that, you'd be able to look
at the actual token value if you were interested in what shift was
used.  So it's redundant, has a runtime cost, it's not currently used
anywhere, and it's not useful to fields other than Trie.  Perhaps it
shouldn't exist (yet)?

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael Busch <bu...@gmail.com>.

On 6/15/09 10:10 AM, Grant Ingersoll wrote:
>
>  But, as Michael M reminded me, it is complex, so please accept my 
> apologies.
>

No worries, Grant! I was not really offended, but rather confused... 
Thanks for clarifying.

  Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Grant Ingersoll <gs...@apache.org>.

On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
> I'm not sure why this (currently having to implement next() too) is  
> such an issue for you. You brought it up at the Lucene meetup too.  
> No user will ever have to implement both (the new API and the old)  
> in their streams/filters. The only reason why we did it this way is  
> to not sacrifice performance for existing streams/filters when  
> people switch to Lucene 2.9. I explained this point in the jira issue:
>
> http://issues.apache.org/jira/browse/LUCENE-1422?focusedCommentId=12644881&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12644881
>
> The only time when we'll ever have to implement both APIs is between  
> now and 2.9, only for new streams and filters that we add before 2.9  
> is released. I don't think it'd be reasonable to consider this  
> disadvantage as a show stopper.

It's an issue b/c I don't like writing dead code and who knows when  
2.9 will actually be out.

I don't think it is a show stopper either.

>> Add on top of it, that the whole point of customizing the chain is  
>> to use it in search and, frankly speaking, somehow I think that  
>> part of the patch was held back.
>>
> I'm not sure what you're implying. Could you elaborate?

Sorry, see my response to Michael M. on this.  I didn't mean to imply  
you were doing something malicious, just that it always felt half done  
to me.  Knowing you, you don't strike me as someone who does things  
half way, so that's why I felt it was held back.  But, as Michael M  
reminded me, it is complex, so please accept my apologies.

>
> The search side of the API is currently being developed in  
> Lucene-1458. 1458 will not make it into 2.9. Therefore I agree that  
> it is not very advantageous to switch to the new API right now for  
> Lucene users. On the other hand, I don't think it hurts either.

I am not sure I agree here.  Forcing people to upgrade their analyzers  
can be quite involved.  Analyzers are one of the main areas that  
people do custom work.  Solr, for instance, has 11 custom TokenFilters  
right now as well as custom Tokenizers, not too mention the ones used  
during testing that aren't shipped.  Upgrading these is a lot of  
work.  I know in previous jobs, I also maintained a fair number  
TokenStream related stuff.  This should not be underestimated.   
Furthermore, as I said back in the initial discussion, Lucene's  
Analyzer stuff is often used outside of Lucene.

In fact, I often think the Analysis piece should be a standalone jar  
(not requiring core) and that core should have a dependency on it.  In  
other words, move o.a.l.analysis (and contrib/analsis) to a standalone  
module that core depends on.  This would make it easier for others to  
consume the Analysis functionality.

>> I personally would vote for reverting until a complete patch that  
>> addresses both sides of the problem is submitted and a better  
>> solution to cloning is put forth.
>>
> If we revert now and put a new flexible API like this into 3.x,  
> which I think is necessary to utilize flexible indexing, then we'll  
> have to wait until 4.0 before we can remove the old API.  
> Disadvantages like the one you mentioned above, will then probably  
> be present much longer.
>
> I mentioned in the following thread that I have started working on a  
> better way of cloning, which will actually be faster compared to the  
> old API. I'll try to get the code out asap.
> http://markmail.org/message/q7pgh2qlm2w7cxfx
>
> I'd be happy to discuss other API proposals that anybody brings up  
> here, that have the same advantages and are more intuitive. We could  
> also beef up the documentation and give a better example about how  
> to convert a stream/filter from the old to the new API; a  
> constructive suggestion that Uwe made at the ApacheCon.

My point here was, at the time, that if others wanted to revert, I  
probably would vote for it.  I'm not proposing we do it, as I think we  
can make do with what we have.  Given the discussion here, I would  
probably change my mind and not support it now.

I think it might be helpful to have some help for people upgrading.   
Perhaps an abstract class that provides the "core" Token attributes  
out of the box as a base class that they can then extend?  That being  
said, forcing people to upgrade could at least help them think about  
the fact that they have no use for the Type attribute or the Offsets  
attributes.  And, testing the cloning stuff would help.  I think the  
current approach underestimates the number of people who need to  
buffer tokens in memory before handing them out.  Sure, it's not as  
many as the main use case, but it's not zero either.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: New Token API was Re: Payloads and TrieRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.

> Also, what about the case where one might have attributes that are meant
> for downstream TokenFilters, but not necessarily for indexing?  Offsets 
> and type come to mind.  Is it the case now that those attributes are not 
> automatically added to the index?   If they are ignored now, what if I 
> want to add them?  I admit, I'm having a hard time finding the code that 
> specifically loops over the Attributes.  I recall seeing it, but can no 
> longer find it.

There is a new Attribute called ShiftAttribute (or NumericShiftAttribute),
when trie range is moved to core. This attribute contains the shifted-away
bits from the prefix encoded value during trie indexing. The idea is to e.g.
have TokenFilters that may additional payloads or others to trie values, but
only do this for specific precisions. In future, it may also be interesting
to automatically add this attribute to the index.

Maybe we should add a read/store method to attributes, that adds an
attribute to the Posting using a IndexOutput/IndexInput (like the
serialization methods).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Grant Ingersoll <gs...@apache.org>.

On Jun 15, 2009, at 2:11 PM, Grant Ingersoll wrote:

>
>
> More questions:
>
> 1. What about Highlighter and MoreLikeThis?  They have not been  
> converted.  Also, what are they going to do if the attributes they  
> need are not available?  Caveat emptor?
> 2. Same for TermVectors.  What if the user specifies with positions  
> and offsets, but the analyzer doesn't produce them?  Caveat emptor?  
> (BTW, this is also true for the new omit TF stuff)
> 3. Also, what about the case where one might have attributes that  
> are meant for downstream TokenFilters, but not necessarily for  
> indexing?  Offsets and type come to mind.  Is it the case now that  
> those attributes are not automatically added to the index?   If they  
> are ignored now, what if I want to add them?  I admit, I'm having a  
> hard time finding the code that specifically loops over the  
> Attributes.  I recall seeing it, but can no longer find it.
>
>
> Also, can we add something like an AttributeTermQuery?  Seems like  
> it could work similar to the BoostingTermQuery.

So, I think I see #1 covered, how about #2, #3 and the notion of an  
AttributeTermQuery?  Anyone have thoughts on those?  I might have some  
time next week to work up a Query, as it sounds like fun, but don't  
hold it to me just yet.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Grant Ingersoll wrote:
>
>
> 1. What about Highlighter
>
I would guess Highlighter has not been updated because its kind of a 
royal ***** :)

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Mark Miller wrote:
> Grant Ingersoll wrote:
>>
>> On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
>>>
>>> I'd be happy to discuss other API proposals that anybody brings up 
>>> here, that have the same advantages and are more intuitive. We could 
>>> also beef up the documentation and give a better example about how 
>>> to convert a stream/filter from the old to the new API; a 
>>> constructive suggestion that Uwe made at the ApacheCon.
>>
>> More questions:
>>
>> 1. What about Highlighter and MoreLikeThis?  They have not been 
>> converted.  Also, what are they going to do if the attributes they 
>> need are not available?  Caveat emptor?
>> 2. Same for TermVectors.  What if the user specifies with positions 
>> and offsets, but the analyzer doesn't produce them?  Caveat emptor? 
>> (BTW, this is also true for the new omit TF stuff)
>> 3. Also, what about the case where one might have attributes that are 
>> meant for downstream TokenFilters, but not necessarily for indexing? 
>>  Offsets and type come to mind.  Is it the case now that those 
>> attributes are not automatically added to the index?   If they are 
>> ignored now, what if I want to add them?  I admit, I'm having a hard 
>> time finding the code that specifically loops over the Attributes.  I 
>> recall seeing it, but can no longer find it.
>>
>>
>> Also, can we add something like an AttributeTermQuery?  Seems like it 
>> could work similar to the BoostingTermQuery.
>>
>> I'm sure more will come to me.
>>
>> -Grant
> If you are using a CachingTokenFilter, and you do something like pass 
> it to something that hasn't upgraded to the new API (say 
> MemoryIndex#addField(String fieldName, TokenStream stream, float 
> boost)) and you are trying to use the new API,
> you will get an exception when trying to read the tokens from the 
> CachingTokenFilter a second time - obviously because
> the old API is cached rather than the new, and when you try and use 
> the new, kak :( .
>
> We can obviously fix anything internal, but not external.
>
Hmm - actually, even if we fix internal, if you are trying to use the 
old API, you will have the same issue in reverse ;)

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Grant Ingersoll wrote:
>
> On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
>>
>> I'd be happy to discuss other API proposals that anybody brings up 
>> here, that have the same advantages and are more intuitive. We could 
>> also beef up the documentation and give a better example about how to 
>> convert a stream/filter from the old to the new API; a constructive 
>> suggestion that Uwe made at the ApacheCon.
>
> More questions:
>
> 1. What about Highlighter and MoreLikeThis?  They have not been 
> converted.  Also, what are they going to do if the attributes they 
> need are not available?  Caveat emptor?
> 2. Same for TermVectors.  What if the user specifies with positions 
> and offsets, but the analyzer doesn't produce them?  Caveat emptor? 
> (BTW, this is also true for the new omit TF stuff)
> 3. Also, what about the case where one might have attributes that are 
> meant for downstream TokenFilters, but not necessarily for indexing? 
>  Offsets and type come to mind.  Is it the case now that those 
> attributes are not automatically added to the index?   If they are 
> ignored now, what if I want to add them?  I admit, I'm having a hard 
> time finding the code that specifically loops over the Attributes.  I 
> recall seeing it, but can no longer find it.
>
>
> Also, can we add something like an AttributeTermQuery?  Seems like it 
> could work similar to the BoostingTermQuery.
>
> I'm sure more will come to me.
>
> -Grant
If you are using a CachingTokenFilter, and you do something like pass it 
to something that hasn't upgraded to the new API (say 
MemoryIndex#addField(String fieldName, TokenStream stream, float boost)) 
and you are trying to use the new API,
you will get an exception when trying to read the tokens from the 
CachingTokenFilter a second time - obviously because
the old API is cached rather than the new, and when you try and use the 
new, kak :( .

We can obviously fix anything internal, but not external.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Grant Ingersoll <gs...@apache.org>.

On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
>
> I'd be happy to discuss other API proposals that anybody brings up  
> here, that have the same advantages and are more intuitive. We could  
> also beef up the documentation and give a better example about how  
> to convert a stream/filter from the old to the new API; a  
> constructive suggestion that Uwe made at the ApacheCon.

More questions:

1. What about Highlighter and MoreLikeThis?  They have not been  
converted.  Also, what are they going to do if the attributes they  
need are not available?  Caveat emptor?
2. Same for TermVectors.  What if the user specifies with positions  
and offsets, but the analyzer doesn't produce them?  Caveat emptor?  
(BTW, this is also true for the new omit TF stuff)
3. Also, what about the case where one might have attributes that are  
meant for downstream TokenFilters, but not necessarily for indexing?   
Offsets and type come to mind.  Is it the case now that those  
attributes are not automatically added to the index?   If they are  
ignored now, what if I want to add them?  I admit, I'm having a hard  
time finding the code that specifically loops over the Attributes.  I  
recall seeing it, but can no longer find it.

Also, can we add something like an AttributeTermQuery?  Seems like it  
could work similar to the BoostingTermQuery.

I'm sure more will come to me.

-Grant

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

Mark Miller wrote:
> I don't know how I feel about rolling the new token api back.
>
> I will say that I originally had no issue with it because I am very 
> excited about Lucene-1458.
>
> At the same time though, I'm thinking Lucene-1458 is a very advanced 
> issue that will likely be for really expert usage (though I can see 
> benefits falling to general users).
>
> I'm slightly iffy about making an intuitive api much less intuitive 
> for an expert future feature that hasn't fully materialized in Lucene 
> yet. It almost seems like that fight should weigh towards general 
> usage and standard users.
>
> I don't have a better proposal though, nor the time to consider it at 
> the moment. I was just more curious if anyone else had any thoughts. I 
> hadn't realized Grant had asked a similar question not long ago
> with no response. Not sure how to take that, but I'd think that would 
> indicate less problems with people than more. On the other hand, you 
> don't have to switch yet (with trunk) and we have yet to release it. I 
> wonder how many non dev, every day users have really had to tussle 
> with the new API yet. Not many people complaining too loudly at the 
> moment though.
>
> Asking for a roll back seems a bit extreme without a little more 
> support behind it than we have seen.
>
> - Mark

PS

I know you didnt ask for a rollback Grant - just kind of talking in a 
general manner. I see your point on getting the search side in, I'm just 
not sure I agree that it really matters if one hits before the other. 
Like Mike says, you don't
have to switch to the new API yet.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Mark Miller <ma...@gmail.com>.

I don't know how I feel about rolling the new token api back.

I will say that I originally had no issue with it because I am very 
excited about Lucene-1458.

At the same time though, I'm thinking Lucene-1458 is a very advanced 
issue that will likely be for really expert usage (though I can see 
benefits falling to general users).

I'm slightly iffy about making an intuitive api much less intuitive for 
an expert future feature that hasn't fully materialized in Lucene yet. 
It almost seems like that fight should weigh towards general usage and 
standard users.

I don't have a better proposal though, nor the time to consider it at 
the moment. I was just more curious if anyone else had any thoughts. I 
hadn't realized Grant had asked a similar question not long ago
with no response. Not sure how to take that, but I'd think that would 
indicate less problems with people than more. On the other hand, you 
don't have to switch yet (with trunk) and we have yet to release it. I 
wonder how many non dev, every day users have really had to tussle with 
the new API yet. Not many people complaining too loudly at the moment 
though.

Asking for a roll back seems a bit extreme without a little more support 
behind it than we have seen.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: New Token API was Re: Payloads and TrieRangeQuery

Posted by Michael Busch <bu...@gmail.com>.

On 6/14/09 5:17 AM, Grant Ingersoll wrote:
> Agreed.  I've been bringing it up for a while now and made the same 
> comments when it was first introduced, but felt like the lone voice in 
> the wilderness on it and gave way [1], [2], [3].  Now that others are 
> writing/converting, I think it is worth revisiting.
>

I am and always was open to constructive suggestions about how to design 
this API.  I know these new APIs currently don't seem to have many 
advantages over the previous ones, but they're basically laying the API 
groundwork for future features like flexible indexing. Some concerns you 
mentioned were targeted against the first version of the patch in 
LUCENE-1422. But, you later said you liked how the next patch looked (in 
thread [2] that you mentioned).

> That being said, I did just write my first TokenFilter with it, and 
> didn't think it was that hard.  There are some gains in it and the API 
> can be simpler if you just need one or two attributes (see 
> DelimitedPayloadTokenFilter), although, just like the move to using 
> char [] in Token, as soon as you do something like store a Token, you 
> lose most of the benefit, I think (for the char [] case, as soon as 
> you need a String in one of your filters, you lose the perf. gain).  
> The annoying parts are that you still have to implement the deprecated 
> next() part, otherwise chances are the thing is unusable by everyone 
> at this point anyway.
>
I'm not sure why this (currently having to implement next() too) is such 
an issue for you. You brought it up at the Lucene meetup too. No user 
will ever have to implement both (the new API and the old) in their 
streams/filters. The only reason why we did it this way is to not 
sacrifice performance for existing streams/filters when people switch to 
Lucene 2.9. I explained this point in the jira issue:

http://issues.apache.org/jira/browse/LUCENE-1422?focusedCommentId=12644881&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12644881

The only time when we'll ever have to implement both APIs is between now 
and 2.9, only for new streams and filters that we add before 2.9 is 
released. I don't think it'd be reasonable to consider this disadvantage 
as a show stopper.
> Add on top of it, that the whole point of customizing the chain is to 
> use it in search and, frankly speaking, somehow I think that part of 
> the patch was held back.
>
I'm not sure what you're implying. Could you elaborate?

The search side of the API is currently being developed in Lucene-1458. 
1458 will not make it into 2.9. Therefore I agree that it is not very 
advantageous to switch to the new API right now for Lucene users. On the 
other hand, I don't think it hurts either.
> I personally would vote for reverting until a complete patch that 
> addresses both sides of the problem is submitted and a better solution 
> to cloning is put forth.
>
If we revert now and put a new flexible API like this into 3.x, which I 
think is necessary to utilize flexible indexing, then we'll have to wait 
until 4.0 before we can remove the old API. Disadvantages like the one 
you mentioned above, will then probably be present much longer.

I mentioned in the following thread that I have started working on a 
better way of cloning, which will actually be faster compared to the old 
API. I'll try to get the code out asap.
http://markmail.org/message/q7pgh2qlm2w7cxfx

I'd be happy to discuss other API proposals that anybody brings up here, 
that have the same advantages and are more intuitive. We could also beef 
up the documentation and give a better example about how to convert a 
stream/filter from the old to the new API; a constructive suggestion 
that Uwe made at the ApacheCon.

-Michael

> -Grant
>
> [1] http://issues.apache.org/jira/browse/LUCENE-1422,
> [2] 
> http://www.lucidimagination.com/search/document/5daf6d7b8027b4d3/tokenstream_and_token_apis#9e2d0d2b5dc118d4, 
> and the rest of the discussion on that thread.
> [3] 
> http://www.lucidimagination.com/search/document/4274335abcf31926/new_tokenstream_api_usage 
>
>
> On Jun 13, 2009, at 10:32 PM, Mark Miller wrote:
>
>> What was the big improvement with it again? Advanced, expert custom 
>> indexing chains require less casting or something right?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org