You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2013/08/08 17:55:44 UTC

Re: UIMA Iterators and Iterables vs Collections

In looking at uimaFIT's select, and some of the discussion on the issue
tracking,  I'm wondering if it would better to have the things that support the
use cases:

for (Token t : select(jcas, Token.class) {

return an Iterable, instead of a Collection.

That would drop the methods in Collection except for iterator().  These methods,
for the most part, deal with things that CAS iterators don't have an easy
ability to deal with, or can't implement due to logical differences between the
CAS/Indexes/Iterators and Collections.  These methods are:
  - size (requires iterating and counting all the elements)
  - add, addAll (not supported, instead, add to the CAS, and choose to index or not)
  - remove, removeAll, clear (not supported, but remove-from-all-indexes is)
  - contains, containsAll, retainAll (not supported because depends on a
particular index's definition of equal)

The only method of Collection that seems like it could be useful (besides, of
course, iterator) for UIMA iterators is toArray, and the general object methods
toString, equals, and hashCode.
 
-Marshall


On 8/8/2013 10:38 AM, Richard Eckart de Castilho wrote:
> How about setting up versions for UIMA 2.5.0 and 3.0.0 and start collecting some
> stuff for these releases?
>
> The methods are not without questions that imho should be discussed before
> merging into core. E.g.
>
> https://issues.apache.org/jira/browse/UIMA-2830
> https://code.google.com/p/uimafit/issues/detail?id=61
> https://code.google.com/p/uimafit/issues/detail?id=65
> https://code.google.com/p/uimafit/issues/detail?id=113
>
> Cheers,
>
> -- Richard
>
> Am 08.08.2013 um 16:31 schrieb Marshall Schor <ms...@schor.com>:
>
>> On 8/8/2013 9:57 AM, Richard Eckart de Castilho wrote:
>>> uimaFIT provides these in the CASUtil and JCasUtil classes:
>>>
>>> select(…)
>>> selectCovered(…)
>>> selectCovering(…)
>>>
>>> E.g.
>>>
>>> for (Token t : select(jcas, Token.class) {
>>> …
>>> }
>> Nice! (Thinking now about "pulling" these and maybe some other things into main
>> UIMA ...)
>>
>> -M
>>
>>
>>> Cheers,
>>>
>>> -- Richard
>>>
>>> Am 08.08.2013 um 15:52 schrieb Marshall Schor <ms...@schor.com>:
>>>
>>>> UIMA implements a bunch of CAS iterators of various types, that extend the
>>>> normal Java Iterator class.  Are there corresponding iterables that allow their
>>>> use in things like for (Token token : xxxxx) { } kinds of statements?  (where
>>>> Token is the JCas cover class for UIMA Type "Token").
>>>>
>>>> If not, is there a reason for this, or just a bit of missing convenience?  Does
>>>> UIMAFit supply these?
>>>>
>>>> -Marshall
>


Re: UIMA Iterators and Iterables vs Collections

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Am 08.08.2013 um 20:33 schrieb Marshall Schor <ms...@schor.com>:

> On 8/8/2013 12:06 PM, Richard Eckart de Castilho wrote:
>> The methods returned Iterable in earlier versions of uimaFIT, but the ability to
>> get the number of annotations of the selected type or to check if it was (non-)empty
>> was sufficiently common that it has been changed to Collection. 
> It's possible to have the method return a class which implements Iterable, and
> has the additional functions, of course; it would not need to implement Collections.
> 
> The implementation of "size" would be potentially slow - since that's not kept;
> it would require iterating through the entire thing and accumulating in a
> counter.  So that might be a "surprise" for users of this interface.

Would it be feasible to make it fast(er) or at least make it fast in simple 
scenarios, e.g. when no conditions are used?

At least isEmpty() should be doable reasonable fast.

>> Also, having it as a Collection allows
>> easily copying of the data into another collection, e.g.
>> 
>> List<Token> tokens = new ArrayList<Token>(select(jcas, Token.class));
> True.  This could also be done with a custom class.  e.g.
> List<Token> tokens = select_v2(jcas, Token.class).toList();

Sounds reasonable. 

This is the kind of discussion that I hoped uimaFIT would spur.

So would you use such functionality in uimaFIT and help cooking it
out for experience and adding such functions to the core later, or
are you keen on adding the stuff to the core asap, so that you can
use it?

UIMA core imho should take a slightly different angle than uimaFIT
on such functionality. E.g. such select() methods should be directly
on the CAS/JCas/FSArray/FSIndexRepsitory etc. interfaces instead of
having static methods.

Cheers,

-- Richard

Re: UIMA Iterators and Iterables vs Collections

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
I'm still not convinced. At least, I'd like to change the signature of
the existing select methods and see how that affects existing Java and
Groovy code. But I agree that there are good arguments ;)

Anyway, there already is a rather sensible interface in UIMA: FSIndex
It implements Iterable and even has a size() method. But it doesn't go
well with non-Annotation types. UIMA-2830 describes some steps I tried
in pushing the existing select* API towards FSIndex, but I got stuck
when non-Annotation FSes were not properly accessible because
getAllIndexFS didn't return an FSIndex, just an FSIterator.

I wonder if there isn't a way, possible one requiring surgery, to
push uimaFIT/UIMA Core towards being able to use FSIndex when accessing
non-Annotation FSes. And that might again be something for an UIMA 3.0.0…

-- Richard

Am 08.08.2013 um 21:19 schrieb Marshall Schor <ms...@schor.com>:

> 
> On 8/8/2013 3:06 PM, Richard Eckart de Castilho wrote:
>> Am 08.08.2013 um 20:33 schrieb Marshall Schor <ms...@schor.com>:
>> 
>>> On 8/8/2013 12:06 PM, Richard Eckart de Castilho wrote:
>>>> The methods returned Iterable in earlier versions of uimaFIT, but the ability to
>>>> get the number of annotations of the selected type or to check if it was (non-)empty
>>>> was sufficiently common that it has been changed to Collection. 
>>> It's possible to have the method return a class which implements Iterable, and
>>> has the additional functions, of course; it would not need to implement Collections.
>> I'd like to point out that the "destructive" methods of collections (basically those that
>> UIMA iterators currently don't work well with) are optional. Them throwing an 
>> UnsupportedOperationException is part of the Java Collection API. So departing from a
>> standard API may not be worth the removal of these optional signatures.
>> 
>> 
>> [1]: The "destructive" algorithms contained in this class, that is, the algorithms that modify the collection on which they operate, are specified to throw UnsupportedOperationException if the collection does not support the appropriate mutation primitive(s), such as the set method. 
>> 
>> 
>> I'm not convinced yet…
> :-)
> 
> I agree the "is empty" is do-able.  The size - I don't know how that could be
> made fast, without some major surgery.  The iterators work by having each
> iterator instance be a set of iterators - one for the "type" plus one for each
> subtype of "type", and arranging all of them to work together.
> 
> Having a custom class might also help in allowing other kinds of operations
> (move forward, move backwards) that the FSIterators allow, which are not in Java
> iterators.
> 
> And, the FSIterators also have a "find" method that sometimes is fairly efficient.
> 
> So - 2 arguments for custom class / interface vs. Collections:  a) can't
> usefully implement most of the collections methods (without causing some
> surprises), and b) may be useful to implement other methods.
> 
> -Marshall
>> 
>> -- Richard
>> 
>> [1] http://docs.oracle.com/javase/6/docs/api/java/util/Collections.html 
> 


Re: UIMA Iterators and Iterables vs Collections

Posted by Marshall Schor <ms...@schor.com>.
On 8/8/2013 3:06 PM, Richard Eckart de Castilho wrote:
> Am 08.08.2013 um 20:33 schrieb Marshall Schor <ms...@schor.com>:
>
>> On 8/8/2013 12:06 PM, Richard Eckart de Castilho wrote:
>>> The methods returned Iterable in earlier versions of uimaFIT, but the ability to
>>> get the number of annotations of the selected type or to check if it was (non-)empty
>>> was sufficiently common that it has been changed to Collection. 
>> It's possible to have the method return a class which implements Iterable, and
>> has the additional functions, of course; it would not need to implement Collections.
> I'd like to point out that the "destructive" methods of collections (basically those that
> UIMA iterators currently don't work well with) are optional. Them throwing an 
> UnsupportedOperationException is part of the Java Collection API. So departing from a
> standard API may not be worth the removal of these optional signatures.
>
>
> [1]: The "destructive" algorithms contained in this class, that is, the algorithms that modify the collection on which they operate, are specified to throw UnsupportedOperationException if the collection does not support the appropriate mutation primitive(s), such as the set method. 
>
>
> I'm not convinced yet…
:-)

I agree the "is empty" is do-able.  The size - I don't know how that could be
made fast, without some major surgery.  The iterators work by having each
iterator instance be a set of iterators - one for the "type" plus one for each
subtype of "type", and arranging all of them to work together.

Having a custom class might also help in allowing other kinds of operations
(move forward, move backwards) that the FSIterators allow, which are not in Java
iterators.

And, the FSIterators also have a "find" method that sometimes is fairly efficient.

So - 2 arguments for custom class / interface vs. Collections:  a) can't
usefully implement most of the collections methods (without causing some
surprises), and b) may be useful to implement other methods.

-Marshall
>
> -- Richard
>
> [1] http://docs.oracle.com/javase/6/docs/api/java/util/Collections.html 


Re: UIMA Iterators and Iterables vs Collections

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Am 08.08.2013 um 20:33 schrieb Marshall Schor <ms...@schor.com>:

> On 8/8/2013 12:06 PM, Richard Eckart de Castilho wrote:
>> The methods returned Iterable in earlier versions of uimaFIT, but the ability to
>> get the number of annotations of the selected type or to check if it was (non-)empty
>> was sufficiently common that it has been changed to Collection. 
> It's possible to have the method return a class which implements Iterable, and
> has the additional functions, of course; it would not need to implement Collections.

I'd like to point out that the "destructive" methods of collections (basically those that
UIMA iterators currently don't work well with) are optional. Them throwing an 
UnsupportedOperationException is part of the Java Collection API. So departing from a
standard API may not be worth the removal of these optional signatures.


[1]: The "destructive" algorithms contained in this class, that is, the algorithms that modify the collection on which they operate, are specified to throw UnsupportedOperationException if the collection does not support the appropriate mutation primitive(s), such as the set method. 


I'm not convinced yet…

-- Richard

[1] http://docs.oracle.com/javase/6/docs/api/java/util/Collections.html 

Re: UIMA Iterators and Iterables vs Collections

Posted by Marshall Schor <ms...@schor.com>.
On 8/8/2013 12:06 PM, Richard Eckart de Castilho wrote:
> The methods returned Iterable in earlier versions of uimaFIT, but the ability to
> get the number of annotations of the selected type or to check if it was (non-)empty
> was sufficiently common that it has been changed to Collection. 
It's possible to have the method return a class which implements Iterable, and
has the additional functions, of course; it would not need to implement Collections.

The implementation of "size" would be potentially slow - since that's not kept;
it would require iterating through the entire thing and accumulating in a
counter.  So that might be a "surprise" for users of this interface.
>
> Also, having it as a Collection allows
> easily copying of the data into another collection, e.g.
>
> List<Token> tokens = new ArrayList<Token>(select(jcas, Token.class));
True.  This could also be done with a custom class.  e.g.
List<Token> tokens = select_v2(jcas, Token.class).toList();

>
> Since the collections are backed by the CAS (if possible), this comes in handy when
> removing annotations, e.g.
>
> for (Token t : new ArrayList<Token>(select(jcas, Token.class))) {
>   if (t has some condition) {
>     t.removeFromIndexes();
>   }
> }
>
> Unfortunately (for what reason I don't know), Java doesn't provide an easy
> way to dump an Iterable into a collection.
>
> Cheers,
>
> -- Richard
>
> Am 08.08.2013 um 17:55 schrieb Marshall Schor <ms...@schor.com>:
>
>> In looking at uimaFIT's select, and some of the discussion on the issue
>> tracking,  I'm wondering if it would better to have the things that support the
>> use cases:
>>
>> for (Token t : select(jcas, Token.class) {
>>
>> return an Iterable, instead of a Collection.
>>
>> That would drop the methods in Collection except for iterator().  These methods,
>> for the most part, deal with things that CAS iterators don't have an easy
>> ability to deal with, or can't implement due to logical differences between the
>> CAS/Indexes/Iterators and Collections.  These methods are:
>>  - size (requires iterating and counting all the elements)
>>  - add, addAll (not supported, instead, add to the CAS, and choose to index or not)
>>  - remove, removeAll, clear (not supported, but remove-from-all-indexes is)
>>  - contains, containsAll, retainAll (not supported because depends on a
>> particular index's definition of equal)
>>
>> The only method of Collection that seems like it could be useful (besides, of
>> course, iterator) for UIMA iterators is toArray, and the general object methods
>> toString, equals, and hashCode.
>>
>> -Marshall
>


Re: UIMA Iterators and Iterables vs Collections

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
The methods returned Iterable in earlier versions of uimaFIT, but the ability to
get the number of annotations of the selected type or to check if it was (non-)empty
was sufficiently common that it has been changed to Collection. 

Also, having it as a Collection allows
easily copying of the data into another collection, e.g.

List<Token> tokens = new ArrayList<Token>(select(jcas, Token.class));

Since the collections are backed by the CAS (if possible), this comes in handy when
removing annotations, e.g.

for (Token t : new ArrayList<Token>(select(jcas, Token.class))) {
  if (t has some condition) {
    t.removeFromIndexes();
  }
}

Unfortunately (for what reason I don't know), Java doesn't provide an easy
way to dump an Iterable into a collection.

Cheers,

-- Richard

Am 08.08.2013 um 17:55 schrieb Marshall Schor <ms...@schor.com>:

> In looking at uimaFIT's select, and some of the discussion on the issue
> tracking,  I'm wondering if it would better to have the things that support the
> use cases:
> 
> for (Token t : select(jcas, Token.class) {
> 
> return an Iterable, instead of a Collection.
> 
> That would drop the methods in Collection except for iterator().  These methods,
> for the most part, deal with things that CAS iterators don't have an easy
> ability to deal with, or can't implement due to logical differences between the
> CAS/Indexes/Iterators and Collections.  These methods are:
>  - size (requires iterating and counting all the elements)
>  - add, addAll (not supported, instead, add to the CAS, and choose to index or not)
>  - remove, removeAll, clear (not supported, but remove-from-all-indexes is)
>  - contains, containsAll, retainAll (not supported because depends on a
> particular index's definition of equal)
> 
> The only method of Collection that seems like it could be useful (besides, of
> course, iterator) for UIMA iterators is toArray, and the general object methods
> toString, equals, and hashCode.
> 
> -Marshall