You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Karl Wettin <ka...@gmail.com> on 2008/06/29 23:50:37 UTC

TokenStream#reset():boolean?

How about depricating

 >  public void reset() throws IOException {}

and refactor it to allow making sure the stream was reset?

To avoid break backwards compatibillity with extentions of the  
TokenStream currently in trunk one could introduce something like this:

 >  /**
 >   * @param  requireConfirmedReset
 >   * @throws ResetException An IOException thrown if parameter  
requireReset
 >   *         was set true and the stream could not be reset.
 >   */
 >  public void reset(boolean requireConfirmedReset)
 >      throws IOException, ResetException {
 >    reset();
 >    if(requireConfirmedReset) throw new ResetException();
 >  }


and then depricate this new method at some strategic time to introduce  
an exception-less solution like:

 >  /** @return true if this stream was successfully reset. */
 >  public boolean reset() throws IOException { return false; }




      karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: TokenStream#reset():boolean?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Karl Wettin wrote:

>
> 7 jul 2008 kl. 13.04 skrev Michael McCandless:
>
>>
>> If we make this change (migrate to "boolean TokenStream.reset()"),  
>> what would IndexWriter do if it calls reset and false is returned?
>
> I don't think the writer ever should call reset(), it is the  
> consumer who is passing a stream to the writer that needs to make  
> sure they are passing a valid stream. It is the consumer that has to  
> do something in case they have exhausted a stream that can not be  
> resetted. I.e. the consumer will need to wrap it in a cache of some  
> kind.

Actually DocumentsWriter already calls reset() now before pulling  
tokens from the TokenStream, but, I agree it really shouldn't have to.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: TokenStream#reset():boolean?

Posted by Karl Wettin <ka...@gmail.com>.
7 jul 2008 kl. 13.04 skrev Michael McCandless:

>
> If we make this change (migrate to "boolean TokenStream.reset()"),  
> what would IndexWriter do if it calls reset and false is returned?

I don't think the writer ever should call reset(), it is the consumer  
who is passing a stream to the writer that needs to make sure they are  
passing a valid stream. It is the consumer that has to do something in  
case they have exhausted a stream that can not be resetted. I.e. the  
consumer will need to wrap it in a cache of some kind.

>
> Are you saying that you want to get to the point where we force all  
> TokenStream subclasses to implement reset (and return true)?  Should  
> we just eventually make reset() abstract (somehow in an incremental  
> backwards compatible way, if there is one)?

Lets say it would be a nice goal that all streams that could reset  
implemented it. But it would not be a vital part of what I'm really  
looking for.


         karl


>
>
> Mike
>
> Karl Wettin wrote:
>
>> 4 jul 2008 kl. 21.02 skrev Michael McCandless:
>>
>>>
>>> But what would cause reset() to not actually work?
>>
>> Extentions of TokenStream does not have handle reset():
>>
>> /** Resets this stream to the beginning. This is an
>>  *  optional operation, so subclasses may or may not
>>  *  implement this method. Reset() is not needed for
>>  *  the standard indexing process. However, if the Tokens
>>  *  of a TokenStream are intended to be consumed more than
>>  *  once, it is necessary to implement reset().
>>  */
>> public void reset() throws IOException {}
>>
>>> And what is a composite stream?
>>
>> I just came up with that name. As in the composite pattern.  A  
>> stream formed by one or more streams and perhaps some code. A  
>> TokenFilter would be the most simple one.
>>
>>> I'm just a little confused on the use case here I think...
>>
>> Here are a few things I think of:
>>
>> Perhaps my composite stream that I iterated for pre processing  
>> reasons is really simple to reset but the input streams does not  
>> support reset.
>>
>> If you add a stream to a field and that stream is exhausted at  
>> addDocument beacuse it did not support reset, you'll end up with an  
>> empty field without a warning.
>>
>> It is a waste of  esources to reset all the parts of a composite  
>> stream if one of the parts failed to reset.
>>
>>
>>        karl
>>
>>
>>>
>>>
>>> Mike
>>>
>>> Karl Wettin wrote:
>>>
>>>> I just want to know if my token stream managed to reset or not.  
>>>> Especially that parts of composite streams.
>>>>
>>>>       karl
>>>>
>>>> 4 jul 2008 kl. 12.13 skrev Michael McCandless:
>>>>
>>>>>
>>>>> Karl,
>>>>>
>>>>> I'm sort of confused by this proposal.  What is the driver  
>>>>> here?  It seems like the overall goal is to have reset() return  
>>>>> a boolean stating whether it was actually implemented by the  
>>>>> subclass of TokenStream?
>>>>>
>>>>> Mike
>>>>>
>>>>> Karl Wettin wrote:
>>>>>
>>>>>> How about depricating
>>>>>>
>>>>>> >  public void reset() throws IOException {}
>>>>>>
>>>>>> and refactor it to allow making sure the stream was reset?
>>>>>>
>>>>>> To avoid break backwards compatibillity with extentions of the  
>>>>>> TokenStream currently in trunk one could introduce something  
>>>>>> like this:
>>>>>>
>>>>>> >  /**
>>>>>> >   * @param  requireConfirmedReset
>>>>>> >   * @throws ResetException An IOException thrown if parameter  
>>>>>> requireReset
>>>>>> >   *         was set true and the stream could not be reset.
>>>>>> >   */
>>>>>> >  public void reset(boolean requireConfirmedReset)
>>>>>> >      throws IOException, ResetException {
>>>>>> >    reset();
>>>>>> >    if(requireConfirmedReset) throw new ResetException();
>>>>>> >  }
>>>>>>
>>>>>>
>>>>>> and then depricate this new method at some strategic time to  
>>>>>> introduce an exception-less solution like:
>>>>>>
>>>>>> >  /** @return true if this stream was successfully reset. */
>>>>>> >  public boolean reset() throws IOException { return false; }
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> karl
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: TokenStream#reset():boolean?

Posted by Michael McCandless <lu...@mikemccandless.com>.
If we make this change (migrate to "boolean TokenStream.reset()"),  
what would IndexWriter do if it calls reset and false is returned?

Are you saying that you want to get to the point where we force all  
TokenStream subclasses to implement reset (and return true)?  Should  
we just eventually make reset() abstract (somehow in an incremental  
backwards compatible way, if there is one)?

Mike

Karl Wettin wrote:

> 4 jul 2008 kl. 21.02 skrev Michael McCandless:
>
>>
>> But what would cause reset() to not actually work?
>
> Extentions of TokenStream does not have handle reset():
>
>  /** Resets this stream to the beginning. This is an
>   *  optional operation, so subclasses may or may not
>   *  implement this method. Reset() is not needed for
>   *  the standard indexing process. However, if the Tokens
>   *  of a TokenStream are intended to be consumed more than
>   *  once, it is necessary to implement reset().
>   */
>  public void reset() throws IOException {}
>
>> And what is a composite stream?
>
> I just came up with that name. As in the composite pattern.  A  
> stream formed by one or more streams and perhaps some code. A  
> TokenFilter would be the most simple one.
>
>> I'm just a little confused on the use case here I think...
>
> Here are a few things I think of:
>
> Perhaps my composite stream that I iterated for pre processing  
> reasons is really simple to reset but the input streams does not  
> support reset.
>
> If you add a stream to a field and that stream is exhausted at  
> addDocument beacuse it did not support reset, you'll end up with an  
> empty field without a warning.
>
> It is a waste of  esources to reset all the parts of a composite  
> stream if one of the parts failed to reset.
>
>
>         karl
>
>
>>
>>
>> Mike
>>
>> Karl Wettin wrote:
>>
>>> I just want to know if my token stream managed to reset or not.  
>>> Especially that parts of composite streams.
>>>
>>>        karl
>>>
>>> 4 jul 2008 kl. 12.13 skrev Michael McCandless:
>>>
>>>>
>>>> Karl,
>>>>
>>>> I'm sort of confused by this proposal.  What is the driver here?   
>>>> It seems like the overall goal is to have reset() return a  
>>>> boolean stating whether it was actually implemented by the  
>>>> subclass of TokenStream?
>>>>
>>>> Mike
>>>>
>>>> Karl Wettin wrote:
>>>>
>>>>> How about depricating
>>>>>
>>>>> >  public void reset() throws IOException {}
>>>>>
>>>>> and refactor it to allow making sure the stream was reset?
>>>>>
>>>>> To avoid break backwards compatibillity with extentions of the  
>>>>> TokenStream currently in trunk one could introduce something  
>>>>> like this:
>>>>>
>>>>> >  /**
>>>>> >   * @param  requireConfirmedReset
>>>>> >   * @throws ResetException An IOException thrown if parameter  
>>>>> requireReset
>>>>> >   *         was set true and the stream could not be reset.
>>>>> >   */
>>>>> >  public void reset(boolean requireConfirmedReset)
>>>>> >      throws IOException, ResetException {
>>>>> >    reset();
>>>>> >    if(requireConfirmedReset) throw new ResetException();
>>>>> >  }
>>>>>
>>>>>
>>>>> and then depricate this new method at some strategic time to  
>>>>> introduce an exception-less solution like:
>>>>>
>>>>> >  /** @return true if this stream was successfully reset. */
>>>>> >  public boolean reset() throws IOException { return false; }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> karl
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: TokenStream#reset():boolean?

Posted by Karl Wettin <ka...@gmail.com>.
4 jul 2008 kl. 21.02 skrev Michael McCandless:

>
> But what would cause reset() to not actually work?

Extentions of TokenStream does not have handle reset():

   /** Resets this stream to the beginning. This is an
    *  optional operation, so subclasses may or may not
    *  implement this method. Reset() is not needed for
    *  the standard indexing process. However, if the Tokens
    *  of a TokenStream are intended to be consumed more than
    *  once, it is necessary to implement reset().
    */
   public void reset() throws IOException {}

> And what is a composite stream?

I just came up with that name. As in the composite pattern.  A stream  
formed by one or more streams and perhaps some code. A TokenFilter  
would be the most simple one.

> I'm just a little confused on the use case here I think...

Here are a few things I think of:

Perhaps my composite stream that I iterated for pre processing reasons  
is really simple to reset but the input streams does not support reset.

If you add a stream to a field and that stream is exhausted at  
addDocument beacuse it did not support reset, you'll end up with an  
empty field without a warning.

It is a waste of  esources to reset all the parts of a composite  
stream if one of the parts failed to reset.


          karl


>
>
> Mike
>
> Karl Wettin wrote:
>
>> I just want to know if my token stream managed to reset or not.  
>> Especially that parts of composite streams.
>>
>>         karl
>>
>> 4 jul 2008 kl. 12.13 skrev Michael McCandless:
>>
>>>
>>> Karl,
>>>
>>> I'm sort of confused by this proposal.  What is the driver here?   
>>> It seems like the overall goal is to have reset() return a boolean  
>>> stating whether it was actually implemented by the subclass of  
>>> TokenStream?
>>>
>>> Mike
>>>
>>> Karl Wettin wrote:
>>>
>>>> How about depricating
>>>>
>>>> >  public void reset() throws IOException {}
>>>>
>>>> and refactor it to allow making sure the stream was reset?
>>>>
>>>> To avoid break backwards compatibillity with extentions of the  
>>>> TokenStream currently in trunk one could introduce something like  
>>>> this:
>>>>
>>>> >  /**
>>>> >   * @param  requireConfirmedReset
>>>> >   * @throws ResetException An IOException thrown if parameter  
>>>> requireReset
>>>> >   *         was set true and the stream could not be reset.
>>>> >   */
>>>> >  public void reset(boolean requireConfirmedReset)
>>>> >      throws IOException, ResetException {
>>>> >    reset();
>>>> >    if(requireConfirmedReset) throw new ResetException();
>>>> >  }
>>>>
>>>>
>>>> and then depricate this new method at some strategic time to  
>>>> introduce an exception-less solution like:
>>>>
>>>> >  /** @return true if this stream was successfully reset. */
>>>> >  public boolean reset() throws IOException { return false; }
>>>>
>>>>
>>>>
>>>>
>>>>  karl
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: TokenStream#reset():boolean?

Posted by Michael McCandless <lu...@mikemccandless.com>.
But what would cause reset() to not actually work?

And what is a composite stream?

I'm just a little confused on the use case here I think...

Mike

Karl Wettin wrote:

> I just want to know if my token stream managed to reset or not.  
> Especially that parts of composite streams.
>
>          karl
>
> 4 jul 2008 kl. 12.13 skrev Michael McCandless:
>
>>
>> Karl,
>>
>> I'm sort of confused by this proposal.  What is the driver here?   
>> It seems like the overall goal is to have reset() return a boolean  
>> stating whether it was actually implemented by the subclass of  
>> TokenStream?
>>
>> Mike
>>
>> Karl Wettin wrote:
>>
>>> How about depricating
>>>
>>> >  public void reset() throws IOException {}
>>>
>>> and refactor it to allow making sure the stream was reset?
>>>
>>> To avoid break backwards compatibillity with extentions of the  
>>> TokenStream currently in trunk one could introduce something like  
>>> this:
>>>
>>> >  /**
>>> >   * @param  requireConfirmedReset
>>> >   * @throws ResetException An IOException thrown if parameter  
>>> requireReset
>>> >   *         was set true and the stream could not be reset.
>>> >   */
>>> >  public void reset(boolean requireConfirmedReset)
>>> >      throws IOException, ResetException {
>>> >    reset();
>>> >    if(requireConfirmedReset) throw new ResetException();
>>> >  }
>>>
>>>
>>> and then depricate this new method at some strategic time to  
>>> introduce an exception-less solution like:
>>>
>>> >  /** @return true if this stream was successfully reset. */
>>> >  public boolean reset() throws IOException { return false; }
>>>
>>>
>>>
>>>
>>>   karl
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: TokenStream#reset():boolean?

Posted by Karl Wettin <ka...@gmail.com>.
I just want to know if my token stream managed to reset or not.  
Especially that parts of composite streams.

           karl

4 jul 2008 kl. 12.13 skrev Michael McCandless:

>
> Karl,
>
> I'm sort of confused by this proposal.  What is the driver here?  It  
> seems like the overall goal is to have reset() return a boolean  
> stating whether it was actually implemented by the subclass of  
> TokenStream?
>
> Mike
>
> Karl Wettin wrote:
>
>> How about depricating
>>
>> >  public void reset() throws IOException {}
>>
>> and refactor it to allow making sure the stream was reset?
>>
>> To avoid break backwards compatibillity with extentions of the  
>> TokenStream currently in trunk one could introduce something like  
>> this:
>>
>> >  /**
>> >   * @param  requireConfirmedReset
>> >   * @throws ResetException An IOException thrown if parameter  
>> requireReset
>> >   *         was set true and the stream could not be reset.
>> >   */
>> >  public void reset(boolean requireConfirmedReset)
>> >      throws IOException, ResetException {
>> >    reset();
>> >    if(requireConfirmedReset) throw new ResetException();
>> >  }
>>
>>
>> and then depricate this new method at some strategic time to  
>> introduce an exception-less solution like:
>>
>> >  /** @return true if this stream was successfully reset. */
>> >  public boolean reset() throws IOException { return false; }
>>
>>
>>
>>
>>    karl
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: TokenStream#reset():boolean?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Karl,

I'm sort of confused by this proposal.  What is the driver here?  It  
seems like the overall goal is to have reset() return a boolean  
stating whether it was actually implemented by the subclass of  
TokenStream?

Mike

Karl Wettin wrote:

> How about depricating
>
> >  public void reset() throws IOException {}
>
> and refactor it to allow making sure the stream was reset?
>
> To avoid break backwards compatibillity with extentions of the  
> TokenStream currently in trunk one could introduce something like  
> this:
>
> >  /**
> >   * @param  requireConfirmedReset
> >   * @throws ResetException An IOException thrown if parameter  
> requireReset
> >   *         was set true and the stream could not be reset.
> >   */
> >  public void reset(boolean requireConfirmedReset)
> >      throws IOException, ResetException {
> >    reset();
> >    if(requireConfirmedReset) throw new ResetException();
> >  }
>
>
> and then depricate this new method at some strategic time to  
> introduce an exception-less solution like:
>
> >  /** @return true if this stream was successfully reset. */
> >  public boolean reset() throws IOException { return false; }
>
>
>
>
>     karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org