You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Burn Lewis (JIRA)" <ui...@incubator.apache.org> on 2008/03/17 21:52:24 UTC

[jira] Created: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

CasMultiplier's hasNext() called twice without an intervening next()
--------------------------------------------------------------------

                 Key: UIMA-911
                 URL: https://issues.apache.org/jira/browse/UIMA-911
             Project: UIMA
          Issue Type: Bug
          Components: Core Java Framework
            Reporter: Burn Lewis
            Priority: Minor


In org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl the hasNext method calls the AnalysisComponent's hasNext but then its next method calls the AnalysisComponent's hasNext again before calling next.

Makes it difficult to split the processing between hasNext & next.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Marshall Schor <ms...@schor.com>.
Adam Lally wrote:
> On Wed, Mar 19, 2008 at 3:47 PM, Burn Lewis <bu...@gmail.com> wrote:
>   
>>  Finally, I may be naive but I can see no technical objection to making the
>>  implementation of the CasIterator support a more forgiving and efficient
>>  interface to user code, especially since the code changes may be simpler
>>  than the documentation changes.
>>     
>
> ... and either would be simpler than having this discussion. ;)
>   
My 2 cents:  The world is complicated; documentation is often to big to 
read.  Where we can, we should insure that our design, when it has 
features that match much more wide-spread designs our users would know 
about (i.e., the Java language), follow the semantics of the wide-spread 
designs that our users would know about.

So I would agree with Adam and Thilo that "hasNext()" should follow the 
Java "contract".

That's not to say, though, that we couldn't apply the patch to not call 
it extra times from the framework. 

I would vote to update the documentation to say something like hasNext() 
works like it does in normal java iterators (I'm assuming that's true 
:-) ), and then include a warning it might be called multiple times, 
although the framework will attempt to avoid doing that (for 
"efficiency" reasons), where feasible.

> I just think it doesn't make sense for a method called hasNext to
> advance the iterator.  
+1.  This is consistent with Java's design here, I think.
> Perhaps it's the very fact that I've learned
> one must be vigilant about this when implementing an Iterator that
> makes this so ingrained to me, so much so that I am objecting on
> principle here.
>
> I have to agree that there isn't any good reason why the framework
> should actually call hasNext two times.  
+1.  It probably should avoid doing this for "efficiency" reasons, where 
feasible.

-Marshall

Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Adam Lally <al...@alum.rpi.edu>.
On Wed, Mar 19, 2008 at 3:47 PM, Burn Lewis <bu...@gmail.com> wrote:
>  Finally, I may be naive but I can see no technical objection to making the
>  implementation of the CasIterator support a more forgiving and efficient
>  interface to user code, especially since the code changes may be simpler
>  than the documentation changes.
>

... and either would be simpler than having this discussion. ;)

I just think it doesn't make sense for a method called hasNext to
advance the iterator.  Perhaps it's the very fact that I've learned
one must be vigilant about this when implementing an Iterator that
makes this so ingrained to me, so much so that I am objecting on
principle here.

I have to agree that there isn't any good reason why the framework
should actually call hasNext two times.  UIMA probably doesn't need to
be providing a Public Service teaching people the hard way how to
write iterators. :)  So I guess bottom line I'm +0 to applying this
patch - not something I particularly care about doing, but if there
are others who think it is, go ahead.

  -Adam

Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Burn Lewis <bu...@gmail.com>.
I think these snippets of code show that it's not a trivial task to
implement an iterator that satisfies the Java Iterator contract.  I don't
see why we should require users to do so since we don't mention it in the
docs and we illustrate what looks like an implicit contract in section 7.1.1,
i.e.

1. The framework calls the CAS Multiplier's process method, passing it an
input CAS. The process method returns, but may hold on to a reference to the
input CAS.
2. The framework then calls the CAS Multiplier's hasNext method. The CAS
Multiplier should return true from this method if it intends to output one
or more new CASes (for instance, segments of this CAS), and false if not.
3. If hasNext returned true, the framework will call the CAS Multiplier's
next method. The CAS Multiplier creates a new CAS (we will see how in a
moment), populates it, and returns it from the hasNext method.
4. Steps 2 and 3 continue until hasNext returns false.

Also I don't think Java iterators are allowed to throw exceptions in hasNext
and ours do.  I don't see how restricting our use of the user's CAS
Multiplier iterator to a single hasNext could cause trouble in existing
implementations, or violates any contract.

We have an implicit (compatibility) contract with Collection Readers since
the CPE calls their hasNext only once, so if a user puts their CR in an
aggregate and UIMA treats it as a CAS Multiplier, it may not work as
expected with the current consecutive hasNext calls.

Finally, I may be naive but I can see no technical objection to making the
implementation of the CasIterator support a more forgiving and efficient
interface to user code, especially since the code changes may be simpler
than the documentation changes.

- Burn

Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Thilo Goetz <tw...@gmx.de>.
Adam Lally wrote:
> On Wed, Mar 19, 2008 at 1:19 PM, Thilo Goetz <tw...@gmx.de> wrote:
>> <snip/>
>>  public Object next() {
>>    if (hasNext()) {
>>      return next;
>>    }
>>    throw new NoSuchElementException();
>>  }
>>
>>  where the actual work happens in hasNext().  The user may or may not
>>  call hasNext(), doesn't matter.  Of course in the implementation of
>>  hasNext(), I have to make sure to remember that I have advanced the
>>  iterator.  Sometimes it works better that way, for internal reasons.
>>
> 
> Well, I think you need something slightly more complicated in next(),
> to avoid having hasNext() advance the iterator every time it is
> called, which wouldn't be right.  Just something like this would work
> though:
> 
>  public Object next() {
>    if (hasNext()) {
>      Object result = this.next;
>      this.next = null;
>      return result;
>    }
>    throw new NoSuchElementException();
>  }
> 
> And then in hasNext() you check if this.next != null and return true
> without advancing.

You're right, I just shouldn't type code into email.  It's buggy
about 50% of the time, which makes it useless ;-)

> 
>>  I have read the documentation again, and I think it's fine.  The
>>  bit about distributing the processing makes perfect sense to me.
>>  So does the part about the calling sequence, but then it wouldn't
>>  occur to me to write an iterator whose hasNext() method may only be
>>  called once before next() is called.  If you think that's necessary,
>>  we could add a note to indicate that hasNext() may be called more
>>  than once.
>>
> 
> I do agree with Eddie and Burn that we could and should do a better
> job in the documentation, to educate people who might not be so
> familiar with how the Iterator interface defines hasNext/next.

If you say so.  I would just like to point out that our documentation
is already so verbose in places, trying hand-hold everybody, that it
is becoming very hard to find the important information.  One
place where this is particularly striking is the installation guide,
but there are others.  Just put in a link or something so it doesn't
detract from the important stuff.

> 
> -Adam


Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Adam Lally <al...@alum.rpi.edu>.
On Wed, Mar 19, 2008 at 1:19 PM, Thilo Goetz <tw...@gmx.de> wrote:
> <snip/>
>  public Object next() {
>    if (hasNext()) {
>      return next;
>    }
>    throw new NoSuchElementException();
>  }
>
>  where the actual work happens in hasNext().  The user may or may not
>  call hasNext(), doesn't matter.  Of course in the implementation of
>  hasNext(), I have to make sure to remember that I have advanced the
>  iterator.  Sometimes it works better that way, for internal reasons.
>

Well, I think you need something slightly more complicated in next(),
to avoid having hasNext() advance the iterator every time it is
called, which wouldn't be right.  Just something like this would work
though:

 public Object next() {
   if (hasNext()) {
     Object result = this.next;
     this.next = null;
     return result;
   }
   throw new NoSuchElementException();
 }

And then in hasNext() you check if this.next != null and return true
without advancing.

>  I have read the documentation again, and I think it's fine.  The
>  bit about distributing the processing makes perfect sense to me.
>  So does the part about the calling sequence, but then it wouldn't
>  occur to me to write an iterator whose hasNext() method may only be
>  called once before next() is called.  If you think that's necessary,
>  we could add a note to indicate that hasNext() may be called more
>  than once.
>

I do agree with Eddie and Burn that we could and should do a better
job in the documentation, to educate people who might not be so
familiar with how the Iterator interface defines hasNext/next.

-Adam

Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Thilo Goetz <tw...@gmx.de>.
Eddie Epstein wrote:
> On Wed, Mar 19, 2008 at 9:28 AM, Adam Lally (JIRA)
> <ui...@incubator.apache.org> wrote:
>>  Users sometimes need to implement the Java iterator interface, in which case they need to make sure that their hasNext method doesn't advance the iterator - so I don't see that this is that different.  I guess a difference in UIMA's case is that there's another level of indirection (the CasIterator) between the calling code and the CasMultiplier, so we could afford to have different contracts at the two layers. CasIterator's hasNext could be called any number of times, but the CasMultiplier's would be guaranteed to only get called once.  I think that's what your fix is doing.  So since end users never interact with the CasMultiplier interface directly, it's not a huge loss if it's contract doesn't follow the standard iterator paradigm.
>>
>>  I guess I still have a preference for keeping the contract with the CasMultiplier analogous to the normal Iterator contract, since I think most people think that an interface with hasNext/next operates in this way.  And who knows, maybe someone else has written some code that directly calls a CasMultiplier component outside of the UIMA framework, and they'd be tripped up by this.  Unlikely, but possible.
>>
>>  Does anyone else want to voice an opinion?

+1, agree completely.

> 
> The documentation claiming that the work in CAS multipliers can be
> distributed between hasNext and next is currently insufficient. Either
> it should be changed to warn users to protect their hasNext code from
> being called multiple times, or the UIMA code changed to avoid hasNext
> being called unnecessarily. I'd vote for changing the code instead of
> the documentation.

As Adam pointed out, the code that implements the Iterator interface
should follow the Iterator contract.  You're allowed to call hasNext()
any number of times before actually advancing the iterator.  Whether
the actual work of advancing the iterator happens in the next() or
hasNext() method is immaterial.  I have written Iterator implementations
like this:

public Object next() {
   if (hasNext()) {
     return next;
   }
   throw new NoSuchElementException();
}

where the actual work happens in hasNext().  The user may or may not
call hasNext(), doesn't matter.  Of course in the implementation of
hasNext(), I have to make sure to remember that I have advanced the
iterator.  Sometimes it works better that way, for internal reasons.

I have read the documentation again, and I think it's fine.  The
bit about distributing the processing makes perfect sense to me.
So does the part about the calling sequence, but then it wouldn't
occur to me to write an iterator whose hasNext() method may only be
called once before next() is called.  If you think that's necessary,
we could add a note to indicate that hasNext() may be called more
than once.

--Thilo

> 
> User code that directly calls a CasMultiplier component outside of the
> UIMA framework should get exactly what it asks for.
> 
> Eddie


Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Eddie Epstein <ea...@gmail.com>.
On Wed, Mar 19, 2008 at 9:28 AM, Adam Lally (JIRA)
<ui...@incubator.apache.org> wrote:
>  Users sometimes need to implement the Java iterator interface, in which case they need to make sure that their hasNext method doesn't advance the iterator - so I don't see that this is that different.  I guess a difference in UIMA's case is that there's another level of indirection (the CasIterator) between the calling code and the CasMultiplier, so we could afford to have different contracts at the two layers. CasIterator's hasNext could be called any number of times, but the CasMultiplier's would be guaranteed to only get called once.  I think that's what your fix is doing.  So since end users never interact with the CasMultiplier interface directly, it's not a huge loss if it's contract doesn't follow the standard iterator paradigm.
>
>  I guess I still have a preference for keeping the contract with the CasMultiplier analogous to the normal Iterator contract, since I think most people think that an interface with hasNext/next operates in this way.  And who knows, maybe someone else has written some code that directly calls a CasMultiplier component outside of the UIMA framework, and they'd be tripped up by this.  Unlikely, but possible.
>
>  Does anyone else want to voice an opinion?

The documentation claiming that the work in CAS multipliers can be
distributed between hasNext and next is currently insufficient. Either
it should be changed to warn users to protect their hasNext code from
being called multiple times, or the UIMA code changed to avoid hasNext
being called unnecessarily. I'd vote for changing the code instead of
the documentation.

User code that directly calls a CasMultiplier component outside of the
UIMA framework should get exactly what it asks for.

Eddie

[jira] Closed: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by "Eddie Epstein (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eddie Epstein closed UIMA-911.
------------------------------

       Resolution: Fixed
    Fix Version/s: 2.2.2S

This fix complies with our existing documentation describing how the framework calls user's CAS multiplier code: first hasNext then next. This allow us to avoid complicating our documentation (for those novice Java users not intimate with Java iterator design) with warnings about how the UIMA framework may call hasNext multiple times before calling next.

Perhaps more importantly, this assures that existing collection readers will work as CAS multipliers, since the CPE only called hasNext once before calling next.



> CasMultiplier's hasNext() called twice without an intervening next()
> --------------------------------------------------------------------
>
>                 Key: UIMA-911
>                 URL: https://issues.apache.org/jira/browse/UIMA-911
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Burn Lewis
>            Priority: Minor
>             Fix For: 2.2.2S
>
>         Attachments: UIMA-911-fix.patch, UIMA-911.patch
>
>
> In org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl the hasNext method calls the AnalysisComponent's hasNext but then its next method calls the AnalysisComponent's hasNext again before calling next.
> Makes it difficult to split the processing between hasNext & next.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by "Burn Lewis (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Burn Lewis updated UIMA-911:
----------------------------

    Attachment: UIMA-911.patch

Here's a patch that makes the NewlineSegmenter check for this and so makes the AnalysisEngine_implTest testProcessAndOutputNewCASes fail

> CasMultiplier's hasNext() called twice without an intervening next()
> --------------------------------------------------------------------
>
>                 Key: UIMA-911
>                 URL: https://issues.apache.org/jira/browse/UIMA-911
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Burn Lewis
>            Priority: Minor
>         Attachments: UIMA-911.patch
>
>
> In org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl the hasNext method calls the AnalysisComponent's hasNext but then its next method calls the AnalysisComponent's hasNext again before calling next.
> Makes it difficult to split the processing between hasNext & next.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by "Burn Lewis (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Burn Lewis updated UIMA-911:
----------------------------

    Attachment: UIMA-911-fix.patch

But this is different ... the user's code is implementing an iterator that the the UIMA framework is using.  The attached fix changes just a  few lines in the AnalysisComponentCasIterator class.  

Note that the CPE gets it right when calling collection readers so existing CRs may break when wrapped as CasMultipliers.

We should at least spell it out in the CAS Multiplier contract:

  The framework guarantees that it will not call next unless hasNext has returned true since the last call to
  process or next , but it may call hasNext even if next has not been called since the last call to hasNext. 

And similar warnings where we suggest doing some of the processing in hasNext.

> CasMultiplier's hasNext() called twice without an intervening next()
> --------------------------------------------------------------------
>
>                 Key: UIMA-911
>                 URL: https://issues.apache.org/jira/browse/UIMA-911
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Burn Lewis
>            Priority: Minor
>         Attachments: UIMA-911-fix.patch, UIMA-911.patch
>
>
> In org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl the hasNext method calls the AnalysisComponent's hasNext but then its next method calls the AnalysisComponent's hasNext again before calling next.
> Makes it difficult to split the processing between hasNext & next.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by "Michael Baessler (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Baessler updated UIMA-911:
----------------------------------

    Fix Version/s:     (was: 2.2.2S)
                   2.2.2

> CasMultiplier's hasNext() called twice without an intervening next()
> --------------------------------------------------------------------
>
>                 Key: UIMA-911
>                 URL: https://issues.apache.org/jira/browse/UIMA-911
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Burn Lewis
>            Priority: Minor
>             Fix For: 2.2.2
>
>         Attachments: UIMA-911-fix.patch, UIMA-911.patch
>
>
> In org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl the hasNext method calls the AnalysisComponent's hasNext but then its next method calls the AnalysisComponent's hasNext again before calling next.
> Makes it difficult to split the processing between hasNext & next.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Burn Lewis <bu...@gmail.com>.
At various times in our docs we indicate that the framework always calls
next after hasNext, and indicate that it doesn't matter in which the
processing is done, e.g. in the tutorial:

    Note that if it is more convenient you can request an empty CAS during
the process or
    hasNext methods, not just during the next method.
    .......
    Instead, you should spread your processing out across the calls to the
hasNext or next methods.

>
> I see no contract that says hasNext may be called more than once before
next, only implicit examples of next always being called after hasNext
returns true.  So we're not violating anything with the current
implementation but with one small change we can make things much easier for
CAS Multiplier writers and reduce their errors.  I have had to work around
this in the past but managed to forget it last week and wasted time
rediscovering the problem.

Burn.

Re: [jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by Marshall Schor <ms...@schor.com>.
Adam Lally (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/UIMA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579625#action_12579625 ] 
>
> Adam Lally commented on UIMA-911:
> ---------------------------------
>
> I don't think this is a bug.  This is meant to work the same as Java Iterators.  If you call an iterator's hasNext() method twice, it gives you the same value and does not advance the iterator.
>   
+1.  I don't think we want to change the contract for this to one which 
guarantees hasNext can only be called once.

-Marshall


[jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by "Adam Lally (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579625#action_12579625 ] 

Adam Lally commented on UIMA-911:
---------------------------------

I don't think this is a bug.  This is meant to work the same as Java Iterators.  If you call an iterator's hasNext() method twice, it gives you the same value and does not advance the iterator.

> CasMultiplier's hasNext() called twice without an intervening next()
> --------------------------------------------------------------------
>
>                 Key: UIMA-911
>                 URL: https://issues.apache.org/jira/browse/UIMA-911
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Burn Lewis
>            Priority: Minor
>         Attachments: UIMA-911.patch
>
>
> In org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl the hasNext method calls the AnalysisComponent's hasNext but then its next method calls the AnalysisComponent's hasNext again before calling next.
> Makes it difficult to split the processing between hasNext & next.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (UIMA-911) CasMultiplier's hasNext() called twice without an intervening next()

Posted by "Adam Lally (JIRA)" <ui...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/UIMA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580361#action_12580361 ] 

Adam Lally commented on UIMA-911:
---------------------------------

Users sometimes need to implement the Java iterator interface, in which case they need to make sure that their hasNext method doesn't advance the iterator - so I don't see that this is that different.  I guess a difference in UIMA's case is that there's another level of indirection (the CasIterator) between the calling code and the CasMultiplier, so we could afford to have different contracts at the two layers. CasIterator's hasNext could be called any number of times, but the CasMultiplier's would be guaranteed to only get called once.  I think that's what your fix is doing.  So since end users never interact with the CasMultiplier interface directly, it's not a huge loss if it's contract doesn't follow the standard iterator paradigm.

I guess I still have a preference for keeping the contract with the CasMultiplier analogous to the normal Iterator contract, since I think most people think that an interface with hasNext/next operates in this way.  And who knows, maybe someone else has written some code that directly calls a CasMultiplier component outside of the UIMA framework, and they'd be tripped up by this.  Unlikely, but possible.  

Does anyone else want to voice an opinion?

> CasMultiplier's hasNext() called twice without an intervening next()
> --------------------------------------------------------------------
>
>                 Key: UIMA-911
>                 URL: https://issues.apache.org/jira/browse/UIMA-911
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Burn Lewis
>            Priority: Minor
>         Attachments: UIMA-911-fix.patch, UIMA-911.patch
>
>
> In org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl the hasNext method calls the AnalysisComponent's hasNext but then its next method calls the AnalysisComponent's hasNext again before calling next.
> Makes it difficult to split the processing between hasNext & next.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.