You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2017/09/13 12:22:32 UTC

opinions, please, for a UV3 proposed change

I posted a Jira for a proposed change in how 0-length UIMA arrays and lists are
managed.  These are immutable objects, and (theoretically) one instance (per
CAS) could be shared.

In the current implementation, this is managed explicitly by the user - they can
use a bunch of new APIs to get shared instances.

I'm thinking a better way is to make this automatically the case, and remove the
new bunch of APIs (a smaller API set is always a good thing, for essentially the
same functionality, IMHO).  The implementation would change so that the calls
that create "new" 0-length arrays/lists would instead of creating a new one,
only do that if none already existed, and if one already did, it would return
that one.

This follows Java's general direction for immutable objects, like Strings and
Integer values, which can be shared.

For cases where people wanted/needed a CAS value "marker" that was tiny, but
unique (like you get with Java's new Object()), we would keep "new TOP(aCas)" as
something that generated unique instances.  What do others think?

I've seen large-scale implementations of UIMA pipelines with lots of defaulted
0-length arrays in them; this has the potential to improve space/time
performance a reasonable amount for these.

-Marshall


Re: opinions, please, for a UV3 proposed change

Posted by Peter Klügl <pe...@averbis.com>.
sounds good +1


Am 15.09.2017 um 09:47 schrieb Richard Eckart de Castilho:
> The CAS could have a convenience method to fetch a zero-length instance, like
> e.g. Collections.emptyList() which could return a shared instance. Users caring
> to optimize could use that without having to implement their own code for managing
> a shared instance. Users relying on object identity could just manually create an
> instance.
>
> WDYT?
>
> -- Richard
>
>> On 14.09.2017, at 16:36, Marshall Schor <ms...@schor.com> wrote:
>>
>> I was mistaken about Java in one detail:  for things like Integer(17), there are
>> two ways to create it:  new Integer(17), or Integer.valueOf(17).  The first call
>> does create a fresh, not == to any other Integer object, while the 2nd call will
>> reuse an existing Integer object for 17 (if it exists).  Users are encouraged to
>> switch to Integer.valueOf(xxx) for efficiency in the Javadocs.
>>
>> I'm now slightly leaning against doing this change for UIMA, because of the edge
>> cases where the user could have depended on object un-equality for 0-length
>> arrays and lists.
>>
>> Users could "manually" achieve the same result using the shared instance values,
>> and (for xmi serialization) marking any features that contain these values as
>> "multi-references-allowed" so the deserialization would share them.  This could
>> become a suggested "best practice" for those who use 0-length arrays and empty
>> lists. 
>>
>> Not doing this would make two Jiras a "won't fix":
>> https://issues.apache.org/jira/browse/UIMA-5564
>> https://issues.apache.org/jira/browse/UIMA-5566
>>
>> What do others think?
>>
>> -Marshall


Re: opinions, please, for a UV3 proposed change

Posted by Marshall Schor <ms...@schor.com>.
Thanks... 
Unless there's some backwards compatibility issue with V2, I agree this is a
good thing to do.

I'll do it in the next RC...

-Marshall

On 9/15/2017 3:10 PM, Richard Eckart de Castilho wrote:
> How about dropping the "get"? Java Collections also does without.
>
> -- Richard
>
>> On 15.09.2017, at 16:16, Marshall Schor <ms...@schor.com> wrote:
>>
>> +1.  New methods on CAS and JCas:
>>
>> getEmptyList(FloatList.class)
>>
>> getEmptyFloatList()
>>
>> etc. for the other 3 lists (integer, string, fs)
>>
>> getEmptyArray(FloatArray.class)
>>
>> getEmptyFloatArray()  etc. for the other 8 types (boolean, byte, short, integer,
>> long, double, string, fs)
>


Re: opinions, please, for a UV3 proposed change

Posted by Richard Eckart de Castilho <re...@apache.org>.
How about dropping the "get"? Java Collections also does without.

-- Richard

> On 15.09.2017, at 16:16, Marshall Schor <ms...@schor.com> wrote:
> 
> +1.  New methods on CAS and JCas:
> 
> getEmptyList(FloatList.class)
> 
> getEmptyFloatList()
> 
> etc. for the other 3 lists (integer, string, fs)
> 
> getEmptyArray(FloatArray.class)
> 
> getEmptyFloatArray()  etc. for the other 8 types (boolean, byte, short, integer,
> long, double, string, fs)


Re: opinions, please, for a UV3 proposed change

Posted by Marshall Schor <ms...@schor.com>.
+1.  New methods on CAS and JCas:

getEmptyList(FloatList.class)

getEmptyFloatList()

etc. for the other 3 lists (integer, string, fs)

getEmptyArray(FloatArray.class)

getEmptyFloatArray()  etc. for the other 8 types (boolean, byte, short, integer,
long, double, string, fs)

-Marshall


On 9/15/2017 3:47 AM, Richard Eckart de Castilho wrote:
> The CAS could have a convenience method to fetch a zero-length instance, like
> e.g. Collections.emptyList() which could return a shared instance. Users caring
> to optimize could use that without having to implement their own code for managing
> a shared instance. Users relying on object identity could just manually create an
> instance.
>
> WDYT?
>
> -- Richard
>
>> On 14.09.2017, at 16:36, Marshall Schor <ms...@schor.com> wrote:
>>
>> I was mistaken about Java in one detail:  for things like Integer(17), there are
>> two ways to create it:  new Integer(17), or Integer.valueOf(17).  The first call
>> does create a fresh, not == to any other Integer object, while the 2nd call will
>> reuse an existing Integer object for 17 (if it exists).  Users are encouraged to
>> switch to Integer.valueOf(xxx) for efficiency in the Javadocs.
>>
>> I'm now slightly leaning against doing this change for UIMA, because of the edge
>> cases where the user could have depended on object un-equality for 0-length
>> arrays and lists.
>>
>> Users could "manually" achieve the same result using the shared instance values,
>> and (for xmi serialization) marking any features that contain these values as
>> "multi-references-allowed" so the deserialization would share them.  This could
>> become a suggested "best practice" for those who use 0-length arrays and empty
>> lists. 
>>
>> Not doing this would make two Jiras a "won't fix":
>> https://issues.apache.org/jira/browse/UIMA-5564
>> https://issues.apache.org/jira/browse/UIMA-5566
>>
>> What do others think?
>>
>> -Marshall
>


Re: opinions, please, for a UV3 proposed change

Posted by Richard Eckart de Castilho <re...@apache.org>.
The CAS could have a convenience method to fetch a zero-length instance, like
e.g. Collections.emptyList() which could return a shared instance. Users caring
to optimize could use that without having to implement their own code for managing
a shared instance. Users relying on object identity could just manually create an
instance.

WDYT?

-- Richard

> On 14.09.2017, at 16:36, Marshall Schor <ms...@schor.com> wrote:
> 
> I was mistaken about Java in one detail:  for things like Integer(17), there are
> two ways to create it:  new Integer(17), or Integer.valueOf(17).  The first call
> does create a fresh, not == to any other Integer object, while the 2nd call will
> reuse an existing Integer object for 17 (if it exists).  Users are encouraged to
> switch to Integer.valueOf(xxx) for efficiency in the Javadocs.
> 
> I'm now slightly leaning against doing this change for UIMA, because of the edge
> cases where the user could have depended on object un-equality for 0-length
> arrays and lists.
> 
> Users could "manually" achieve the same result using the shared instance values,
> and (for xmi serialization) marking any features that contain these values as
> "multi-references-allowed" so the deserialization would share them.  This could
> become a suggested "best practice" for those who use 0-length arrays and empty
> lists. 
> 
> Not doing this would make two Jiras a "won't fix":
> https://issues.apache.org/jira/browse/UIMA-5564
> https://issues.apache.org/jira/browse/UIMA-5566
> 
> What do others think?
> 
> -Marshall


Re: opinions, please, for a UV3 proposed change

Posted by Peter Klügl <pe...@averbis.com>.
+0

I do not have a strong opinion here. As far as I now, we do not use
(many) empty StringArrays (or others) but rather null values for the
features.

Personally, I would prefer sharing as I see no real need for 0-length
arrays used as markers.


Peter

Am 14.09.2017 um 16:36 schrieb Marshall Schor:
> I was mistaken about Java in one detail:  for things like Integer(17), there are
> two ways to create it:  new Integer(17), or Integer.valueOf(17).  The first call
> does create a fresh, not == to any other Integer object, while the 2nd call will
> reuse an existing Integer object for 17 (if it exists).  Users are encouraged to
> switch to Integer.valueOf(xxx) for efficiency in the Javadocs.
>
> I'm now slightly leaning against doing this change for UIMA, because of the edge
> cases where the user could have depended on object un-equality for 0-length
> arrays and lists.
>
> Users could "manually" achieve the same result using the shared instance values,
> and (for xmi serialization) marking any features that contain these values as
> "multi-references-allowed" so the deserialization would share them.  This could
> become a suggested "best practice" for those who use 0-length arrays and empty
> lists. 
>
> Not doing this would make two Jiras a "won't fix":
> https://issues.apache.org/jira/browse/UIMA-5564
> https://issues.apache.org/jira/browse/UIMA-5566
>
> What do others think?
>
> -Marshall
>
> On 9/13/2017 8:22 AM, Marshall Schor wrote:
>> I posted a Jira for a proposed change in how 0-length UIMA arrays and lists are
>> managed.  These are immutable objects, and (theoretically) one instance (per
>> CAS) could be shared.
>>
>> In the current implementation, this is managed explicitly by the user - they can
>> use a bunch of new APIs to get shared instances.
>>
>> I'm thinking a better way is to make this automatically the case, and remove the
>> new bunch of APIs (a smaller API set is always a good thing, for essentially the
>> same functionality, IMHO).  The implementation would change so that the calls
>> that create "new" 0-length arrays/lists would instead of creating a new one,
>> only do that if none already existed, and if one already did, it would return
>> that one.
>>
>> This follows Java's general direction for immutable objects, like Strings and
>> Integer values, which can be shared.
>>
>> For cases where people wanted/needed a CAS value "marker" that was tiny, but
>> unique (like you get with Java's new Object()), we would keep "new TOP(aCas)" as
>> something that generated unique instances.  What do others think?
>>
>> I've seen large-scale implementations of UIMA pipelines with lots of defaulted
>> 0-length arrays in them; this has the potential to improve space/time
>> performance a reasonable amount for these.
>>
>> -Marshall
>>
>>


Re: opinions, please, for a UV3 proposed change

Posted by Marshall Schor <ms...@schor.com>.
I was mistaken about Java in one detail:  for things like Integer(17), there are
two ways to create it:  new Integer(17), or Integer.valueOf(17).  The first call
does create a fresh, not == to any other Integer object, while the 2nd call will
reuse an existing Integer object for 17 (if it exists).  Users are encouraged to
switch to Integer.valueOf(xxx) for efficiency in the Javadocs.

I'm now slightly leaning against doing this change for UIMA, because of the edge
cases where the user could have depended on object un-equality for 0-length
arrays and lists.

Users could "manually" achieve the same result using the shared instance values,
and (for xmi serialization) marking any features that contain these values as
"multi-references-allowed" so the deserialization would share them.  This could
become a suggested "best practice" for those who use 0-length arrays and empty
lists. 

Not doing this would make two Jiras a "won't fix":
https://issues.apache.org/jira/browse/UIMA-5564
https://issues.apache.org/jira/browse/UIMA-5566

What do others think?

-Marshall

On 9/13/2017 8:22 AM, Marshall Schor wrote:
> I posted a Jira for a proposed change in how 0-length UIMA arrays and lists are
> managed.  These are immutable objects, and (theoretically) one instance (per
> CAS) could be shared.
>
> In the current implementation, this is managed explicitly by the user - they can
> use a bunch of new APIs to get shared instances.
>
> I'm thinking a better way is to make this automatically the case, and remove the
> new bunch of APIs (a smaller API set is always a good thing, for essentially the
> same functionality, IMHO).  The implementation would change so that the calls
> that create "new" 0-length arrays/lists would instead of creating a new one,
> only do that if none already existed, and if one already did, it would return
> that one.
>
> This follows Java's general direction for immutable objects, like Strings and
> Integer values, which can be shared.
>
> For cases where people wanted/needed a CAS value "marker" that was tiny, but
> unique (like you get with Java's new Object()), we would keep "new TOP(aCas)" as
> something that generated unique instances.  What do others think?
>
> I've seen large-scale implementations of UIMA pipelines with lots of defaulted
> 0-length arrays in them; this has the potential to improve space/time
> performance a reasonable amount for these.
>
> -Marshall
>
>