You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Hugo Mougard <mo...@crydee.eu> on 2014/03/31 06:50:36 UTC

Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader

Hello,

I won't address the type system description part, but about the 
collection reader, you could make use of reflection to ease the 
maintenance overhead (for example with the guava library. The idea would 
be to autodetect if types are present in a given package and use them 
accordingly. The following snippet will put in a map the classes that 
you can use based on a given package and the fact that they implement 
Annotation: https://gist.github.com/m09/9885425

You could then use it like so, in the getItemAnnotationForType method:

String annName = annType.replace("-", "").toLowerCase(Locale.English);
if (annotations.containsKey(annName)) {
     return 
annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jcas);
} else {
     new UnknownItemAnnotation(jcas);
}

Best,
Hugo

On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi,
>
> I have a working CollectionReader implementation which converts from 
> a particular web service to UIMA annotations, based primarily on 
> uimaFIT. It works OK, but the problem is that the web service has its 
> own implicit dynamic type system, particularly for document 
> annotations, and that is currently not being well-handled (I can put a 
> 'type' string as a textual feature, but UIMA is not set up to query 
> over these kinds of annotations, as far as I can tell, so it seems 
> suboptimal).
>
> I have now written code which can generate a TypeSystemDescription at 
> runtime for the dynamic types based on the web service output. However, 
> I'm not sure how to most sensibly integrate that with my uimaFIT 
> architecture. Does anyone have any ideas? I guess I could stop using 
> uimaFIT altogether - maybe it's not the right solution here, (although 
> I'm also not entirely sure of the best way to handle this in classic 
> UIMA).
>
> I'd like to keep using uimaFIT if possible though - many other types, 
> particularly those relating to overall document metadata, are already 
> working very nicely and succinctly with uimaFIT.
>
>
> BTW, the current CollectionReader implementation, which hard-codes 
> handling of some types, and uses the textual string fallback in other 
> cases, can be found at 
> https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd165ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.java?at=default
>
>
> Thanks,
> Andy
>



Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader

Posted by Andrew MacKinlay <am...@akmy.net>.
Fantastic, thanks very much for the tips. I had just stumbled across createReaderDescription (after my initial post, of course), and it's reassuring to know that it should just work (I don't think I'll need to manually merge, but it's useful to know how if needed). It's also useful to know about the practical differences between CAS and JCas, which I'd never really worked out before.

Andy

On 31/03/2014, at 5:59 PM, Richard Eckart de Castilho wrote:

> You can pass in your TSD to the reader.
> 
> createReaderDescription(YourReader.class, tsd, PARAM_1, value_1, PARAM_2, value_2, ...)
> 
> It is sufficient to add your types to the reader. They will automatically apply to other
> components if you run them in the same pipeline as the reader. In fact, the CAS will be
> initialized from the merged TSDs in all components within a pipeline.
> 
> If you have other non-dynamic types you can merge them with your dynamically created TSD
> using something like this
> 
> tsd = CasCreationUtils.mergeTypeSystems(
>  asList(tsd, TypeSystemDescriptionFactory.createTypeSystemDescription()));
> 
> If you work with dynamically created types, you can largely forget about using JCas and just
> go with the CAS interface. If one starts thinking about using reflection on UIMA types, the
> time has come to switch from JCas to CAS. Of course you can mix both approaches and still
> use JCas for the non-dynamic types in your annotator/reader.
> 
> Cheers,
> 
> -- Richard
> 
> On 31.03.2014, at 07:14, Andrew MacKinlay <am...@akmy.net> wrote:
> 
>> Ah, thanks - that's probably nicer than my current implementation where every type has to be handled in two places, but I think it's not exactly going to work for me for a couple of reasons, which I didn't articulate in my initial post. Firstly, to complicate things a little, that annotation type string, which that current implementation expects to be a single word, is actually now a URI. My type system description creation code converts this to a fully-qualified dotted Java/UIMA type name. 
>> 
>> In principle, I guess I could do something similar for a fully-qualified type name, but in practice guaranteeing uniqueness for a type name converted from a URL is pretty much impossible if you want human-readability ("http://foo-bar.example.org/qw#first-name" and "http://foo-bar.example.org/qw/first-name" map to the same thing currently, so I add an arbitrary suffix if there are collisions), which means that the conversion is lossy, even if practically this would almost certainly not occur.
>> 
>> Secondly, I guess my current hard-coded solution for managing the types implies that the set of types is stable enough that it would be feasible to implement most of them manually, with the unknown item fallback. However, this was in fact a quick-and-dirty solution for a demo, and I'm no longer convinced that manual static implementations of *any* leaf annotation types is the Right Thing To Do, due to various considerations such as the fact that these types are stored dynamically within the web service and are really properties of a particular data set which is being exposed, rather than part of the defined API of the web service.
>> 
>> 
>> Thanks again,
>> Andy
>> 
>> 
>> On 31/03/2014, at 3:50 PM, Hugo Mougard wrote:
>> 
>>> Hello,
>>> 
>>> I won't address the type system description part, but about the collection reader, you could make use of reflection to ease the maintenance overhead (for example with the guava library. The idea would be to autodetect if types are present in a given package and use them accordingly. The following snippet will put in a map the classes that you can use based on a given package and the fact that they implement Annotation: https://gist.github.com/m09/9885425
>>> 
>>> You could then use it like so, in the getItemAnnotationForType method:
>>> 
>>> String annName = annType.replace("-", "").toLowerCase(Locale.English);
>>> if (annotations.containsKey(annName)) {
>>>  return annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jcas);
>>> } else {
>>>  new UnknownItemAnnotation(jcas);
>>> }
>>> 
>>> Best,
>>> Hugo
>>> 
>>> On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi,
>>>> 
>>>> I have a working CollectionReader implementation which converts from a particular web service to UIMA annotations, based primarily on uimaFIT. It works OK, but the problem is that the web service has its own implicit dynamic type system, particularly for document annotations, and that is currently not being well-handled (I can put a 'type' string as a textual feature, but UIMA is not set up to query over these kinds of annotations, as far as I can tell, so it seems suboptimal).
>>>> 
>>>> I have now written code which can generate a TypeSystemDescription at runtime for the dynamic types based on the web service output. However, I'm not sure how to most sensibly integrate that with my uimaFIT architecture. Does anyone have any ideas? I guess I could stop using uimaFIT altogether - maybe it's not the right solution here, (although I'm also not entirely sure of the best way to handle this in classic UIMA).
>>>> 
>>>> I'd like to keep using uimaFIT if possible though - many other types, particularly those relating to overall document metadata, are already working very nicely and succinctly with uimaFIT.
>>>> 
>>>> 
>>>> BTW, the current CollectionReader implementation, which hard-codes handling of some types, and uses the textual string fallback in other cases, can be found at https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd165ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.java?at=default
>>>> 
>>>> 
>>>> Thanks,
>>>> Andy
>>>> 
>>> 
>>> 
>> 
> 


Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader

Posted by Richard Eckart de Castilho <re...@apache.org>.
You can pass in your TSD to the reader.

createReaderDescription(YourReader.class, tsd, PARAM_1, value_1, PARAM_2, value_2, ...)

It is sufficient to add your types to the reader. They will automatically apply to other
components if you run them in the same pipeline as the reader. In fact, the CAS will be
initialized from the merged TSDs in all components within a pipeline.

If you have other non-dynamic types you can merge them with your dynamically created TSD
using something like this

tsd = CasCreationUtils.mergeTypeSystems(
  asList(tsd, TypeSystemDescriptionFactory.createTypeSystemDescription()));

If you work with dynamically created types, you can largely forget about using JCas and just
go with the CAS interface. If one starts thinking about using reflection on UIMA types, the
time has come to switch from JCas to CAS. Of course you can mix both approaches and still
use JCas for the non-dynamic types in your annotator/reader.

Cheers,

-- Richard

On 31.03.2014, at 07:14, Andrew MacKinlay <am...@akmy.net> wrote:

> Ah, thanks - that's probably nicer than my current implementation where every type has to be handled in two places, but I think it's not exactly going to work for me for a couple of reasons, which I didn't articulate in my initial post. Firstly, to complicate things a little, that annotation type string, which that current implementation expects to be a single word, is actually now a URI. My type system description creation code converts this to a fully-qualified dotted Java/UIMA type name. 
> 
> In principle, I guess I could do something similar for a fully-qualified type name, but in practice guaranteeing uniqueness for a type name converted from a URL is pretty much impossible if you want human-readability ("http://foo-bar.example.org/qw#first-name" and "http://foo-bar.example.org/qw/first-name" map to the same thing currently, so I add an arbitrary suffix if there are collisions), which means that the conversion is lossy, even if practically this would almost certainly not occur.
> 
> Secondly, I guess my current hard-coded solution for managing the types implies that the set of types is stable enough that it would be feasible to implement most of them manually, with the unknown item fallback. However, this was in fact a quick-and-dirty solution for a demo, and I'm no longer convinced that manual static implementations of *any* leaf annotation types is the Right Thing To Do, due to various considerations such as the fact that these types are stored dynamically within the web service and are really properties of a particular data set which is being exposed, rather than part of the defined API of the web service.
> 
> 
> Thanks again,
> Andy
> 
> 
> On 31/03/2014, at 3:50 PM, Hugo Mougard wrote:
> 
>> Hello,
>> 
>> I won't address the type system description part, but about the collection reader, you could make use of reflection to ease the maintenance overhead (for example with the guava library. The idea would be to autodetect if types are present in a given package and use them accordingly. The following snippet will put in a map the classes that you can use based on a given package and the fact that they implement Annotation: https://gist.github.com/m09/9885425
>> 
>> You could then use it like so, in the getItemAnnotationForType method:
>> 
>> String annName = annType.replace("-", "").toLowerCase(Locale.English);
>> if (annotations.containsKey(annName)) {
>>   return annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jcas);
>> } else {
>>   new UnknownItemAnnotation(jcas);
>> }
>> 
>> Best,
>> Hugo
>> 
>> On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi,
>>> 
>>> I have a working CollectionReader implementation which converts from a particular web service to UIMA annotations, based primarily on uimaFIT. It works OK, but the problem is that the web service has its own implicit dynamic type system, particularly for document annotations, and that is currently not being well-handled (I can put a 'type' string as a textual feature, but UIMA is not set up to query over these kinds of annotations, as far as I can tell, so it seems suboptimal).
>>> 
>>> I have now written code which can generate a TypeSystemDescription at runtime for the dynamic types based on the web service output. However, I'm not sure how to most sensibly integrate that with my uimaFIT architecture. Does anyone have any ideas? I guess I could stop using uimaFIT altogether - maybe it's not the right solution here, (although I'm also not entirely sure of the best way to handle this in classic UIMA).
>>> 
>>> I'd like to keep using uimaFIT if possible though - many other types, particularly those relating to overall document metadata, are already working very nicely and succinctly with uimaFIT.
>>> 
>>> 
>>> BTW, the current CollectionReader implementation, which hard-codes handling of some types, and uses the textual string fallback in other cases, can be found at https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd165ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.java?at=default
>>> 
>>> 
>>> Thanks,
>>> Andy
>>> 
>> 
>> 
> 


Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader

Posted by Andrew MacKinlay <am...@akmy.net>.
Ah, thanks - that's probably nicer than my current implementation where every type has to be handled in two places, but I think it's not exactly going to work for me for a couple of reasons, which I didn't articulate in my initial post. Firstly, to complicate things a little, that annotation type string, which that current implementation expects to be a single word, is actually now a URI. My type system description creation code converts this to a fully-qualified dotted Java/UIMA type name. 

In principle, I guess I could do something similar for a fully-qualified type name, but in practice guaranteeing uniqueness for a type name converted from a URL is pretty much impossible if you want human-readability ("http://foo-bar.example.org/qw#first-name" and "http://foo-bar.example.org/qw/first-name" map to the same thing currently, so I add an arbitrary suffix if there are collisions), which means that the conversion is lossy, even if practically this would almost certainly not occur.

Secondly, I guess my current hard-coded solution for managing the types implies that the set of types is stable enough that it would be feasible to implement most of them manually, with the unknown item fallback. However, this was in fact a quick-and-dirty solution for a demo, and I'm no longer convinced that manual static implementations of *any* leaf annotation types is the Right Thing To Do, due to various considerations such as the fact that these types are stored dynamically within the web service and are really properties of a particular data set which is being exposed, rather than part of the defined API of the web service.


Thanks again,
Andy


On 31/03/2014, at 3:50 PM, Hugo Mougard wrote:

> Hello,
> 
> I won't address the type system description part, but about the collection reader, you could make use of reflection to ease the maintenance overhead (for example with the guava library. The idea would be to autodetect if types are present in a given package and use them accordingly. The following snippet will put in a map the classes that you can use based on a given package and the fact that they implement Annotation: https://gist.github.com/m09/9885425
> 
> You could then use it like so, in the getItemAnnotationForType method:
> 
> String annName = annType.replace("-", "").toLowerCase(Locale.English);
> if (annotations.containsKey(annName)) {
>    return annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jcas);
> } else {
>    new UnknownItemAnnotation(jcas);
> }
> 
> Best,
> Hugo
> 
> On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi,
>> 
>> I have a working CollectionReader implementation which converts from a particular web service to UIMA annotations, based primarily on uimaFIT. It works OK, but the problem is that the web service has its own implicit dynamic type system, particularly for document annotations, and that is currently not being well-handled (I can put a 'type' string as a textual feature, but UIMA is not set up to query over these kinds of annotations, as far as I can tell, so it seems suboptimal).
>> 
>> I have now written code which can generate a TypeSystemDescription at runtime for the dynamic types based on the web service output. However, I'm not sure how to most sensibly integrate that with my uimaFIT architecture. Does anyone have any ideas? I guess I could stop using uimaFIT altogether - maybe it's not the right solution here, (although I'm also not entirely sure of the best way to handle this in classic UIMA).
>> 
>> I'd like to keep using uimaFIT if possible though - many other types, particularly those relating to overall document metadata, are already working very nicely and succinctly with uimaFIT.
>> 
>> 
>> BTW, the current CollectionReader implementation, which hard-codes handling of some types, and uses the textual string fallback in other cases, can be found at https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd165ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.java?at=default
>> 
>> 
>> Thanks,
>> Andy
>> 
> 
>