You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Thilo Goetz <tw...@gmx.de> on 2009/05/12 18:02:12 UTC

Discussion of next UIMA release

Hi UIMA users,

just a quick note to let you know that I've kicked off
discussions about the next release on uima-dev.  If
there's anything missing in UIMA that you'd *really* like
to see in the next release, now would be a good time
to let everybody know.  Maybe you have patch up your
sleeve?

Thanks,
Thilo

Re: Discussion of next UIMA release

Posted by Peter Klügl <pk...@ki.informatik.uni-wuerzburg.de>.

Hi Thilo,

that's maybe the right time to mention the subIterator() method 
independent of the type priorities.

Thanks

Peter

Thilo Goetz schrieb:
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev.  If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know.  Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo
>

Re: Discussion of next UIMA release

Posted by Tommaso Teofili <to...@gmail.com>.

2009/5/19 Manuel Fiorelli <ma...@gmail.com>

> I would like to see a well-established way to analyze semi-structured
> documents, such as (X)HTML pages. UIMA shouldn't provide its own
> parser, but at least a type system (like uima.cas) to represent a DOM
> Document within a CAS instance (the simplest solution is to represent
> element nodes as feature structures and text nodes as annotations on
> the plain text, but I suspect there are more convenient solutions).
>

I do agree with this.
Tommaso

Re: document structure (was: Discussion of next UIMA release)

Posted by Greg Holmberg <ho...@comcast.net>.

On Tue, 19 May 2009 11:04:49 -0700, Manuel Fiorelli  
<ma...@gmail.com> wrote:
> I'm happy to see I am not the only who feels  this feature to be
> useful. I saw that in your model, every node is an annotation, which
> is fine to easily implement the property "textContent", which returns
> the text contained in an Element.
>
> Also the support for pdf (and other document formats) would be an
> important addition...
>
> Manuel Fiorelli

For PDF filtering, check out this open-source project:  
http://aperture.sourceforge.net

This handles PDF, HTML, XML, RTF, Office, OpenOffice, Corel, email, ical.  
It also provides crawlers.  It's built on other open-source libraries,  
such as POI and PDFBox, but adds the ability to produce XML with RDF  
elements.  The RDF could be represented in the document model I proposed.

Greg

Re: document structure (was: Discussion of next UIMA release)

Posted by Manuel Fiorelli <ma...@gmail.com>.

2009/5/19 Greg Holmberg <ho...@comcast.net>:
> I sketched a possible solution to this on the wiki
> (http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document
> model") back in 2007, but it didn't generate much interest.  There's also a
> proposal for document properties, beyond the simple
> SourceDocumentInformation class.

I'm happy to see I am not the only who feels  this feature to be
useful. I saw that in your model, every node is an annotation, which
is fine to easily implement the property "textContent", which returns
the text contained in an Element.

Also the support for pdf (and other document formats) would be an
important addition...

Manuel Fiorelli

Re: document structure (was: Discussion of next UIMA release)

Posted by Kameron Cole <ka...@us.ibm.com>.

How would DITA play into this?  It seems to me that whether the community
adopts it or not, DITA is a de facto standard for document structure.
Further, I clearly see millions of applications for text analytics and
DITA.


** ** ** **
Kameron Arthur Cole
Senior IT Specialist, Managing Consultant
IBM Information Management Lab Services.
kameroncole@us.ibm.com


home office: 305-831-4058 / mobile office: 305.905.4112 / fax: 845.491.4052


ECM Lab Services Mission:
To provide fee-based services and ECM centric solutions around our products
with profitable delivery, high customer satisfaction and rapid ROI
realization.


Information Clearing House for OmniFind (my blog)


Worldwide Discovery (OmniFind) Tech SalesWiki


IBM Enterprise Content Management


                                                                           
             Eddie Epstein                                                 
             <eaepstein@gmail.                                             
             com>                                                       To 
                                       uima-user@incubator.apache.org      
             05/19/2009 06:04                                           cc 
             PM                                                            
                                                                   Subject 
                                       Re: document structure (was:        
             Please respond to         Discussion of next UIMA release)    
             uima-user@incubat                                             
               or.apache.org                                               
                                                                           
                                                                           
                                                                           
                                                                           




Hi Greg,

Since your original proposal back in 2007 there has been a growing
effort to add annotators to the project. Do you have any components
that use the proposed document model type system, say a collection
reader, that you would be willing to submit?

Regards,
Eddie

On Tue, May 19, 2009 at 1:02 PM, Greg Holmberg <ho...@comcast.net>
wrote:
> I feel that the lack of any standard in UIMA regarding the structure of
the
> document being analyzed (that is, beyond simply plain text) makes it
pretty
> much impossible to combine annotators from different sources--one of the
> primary justifications of UIMA, in my opinion.
>
> I sketched a possible solution to this on the wiki
> (http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document
> model") back in 2007, but it didn't generate much interest.  There's also
a
> proposal for document properties, beyond the simple
> SourceDocumentInformation class.
>
>

Re: TikaAnnotator (was: document structure)

Posted by Tong Fin <to...@gmail.com>.

Since we have some users using this project, it maybe a good candidate for
graduation from sandbox.

Opinions ?

-- Tong

On Fri, May 22, 2009 at 3:58 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Julien Nioche wrote:
>
>> Hi,
>>
>> I contributed an annotator to the sandbox some time ago which uses Tika to
>> convert original markup into UIMA annotations. It does not seem to be
>> listed
>> on the website but it should be in the SVN repository of the sandbox.
>>
>> Tika supports numerous formats such as PDF, XML, HTML
>>
> I checked in the code 4 months ago. Please have a look at it to make
> sure everything is as intended.
>
> Here is the svn link:
> http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/TikaAnnotator/
>
> Jörn
>

Re: TikaAnnotator (was: document structure)

Posted by Jörn Kottmann <ko...@gmail.com>.

Julien Nioche wrote:
> Hi,
>
> I contributed an annotator to the sandbox some time ago which uses Tika to
> convert original markup into UIMA annotations. It does not seem to be listed
> on the website but it should be in the SVN repository of the sandbox.
>
> Tika supports numerous formats such as PDF, XML, HTML
I checked in the code 4 months ago. Please have a look at it to make
sure everything is as intended.

Here is the svn link:
http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/TikaAnnotator/

Jörn

Re: document structure

Posted by Marshall Schor <ms...@schor.com>.

I updated the UIMA website's sandbox page with this information.

-Marshall

Julien Nioche wrote:
> Hi Marshall,
>
> There is a description in the README.txt file from the TikaAnnotator
> repository, which I have slightly rewritten into the text below.
>
>
> *Apache Tika is a toolkit for detecting and extracting metadata and
> structured text content from various documents using existing parser
> libraries. The TikaAnnotator uses Tika to generate annotations representing
> the original markup of a document, extract its text and metadata. It
> consists of three resources :
>
> - FileSystemCollectionReader : similar to the one in UIMA examples but uses
> TIKA to extract the text from binary documents and generates annotations to
> represent the markup
>
> - MarkupAnnotator : takes the original content from a view and generates a
> new view containing the extracted text with markup annotations
>
> - TikaWrapper : utility class which allows to populate a CAS from a binary
> document; used by the FileSystemCollectionReader *
>
>
> Best,
>
> J.
>
>

Re: document structure

Posted by Julien Nioche <li...@gmail.com>.

Hi Marshall,

There is a description in the README.txt file from the TikaAnnotator
repository, which I have slightly rewritten into the text below.

*Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries. The TikaAnnotator uses Tika to generate annotations representing
the original markup of a document, extract its text and metadata. It
consists of three resources :

- FileSystemCollectionReader : similar to the one in UIMA examples but uses
TIKA to extract the text from binary documents and generates annotations to
represent the markup

- MarkupAnnotator : takes the original content from a view and generates a
new view containing the extracted text with markup annotations

- TikaWrapper : utility class which allows to populate a CAS from a binary
document; used by the FileSystemCollectionReader *

Best,

J.

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2009/5/22 Marshall Schor <ms...@schor.com>

> Hi Julien,
>
> Can you write up a little something and submit a patch to the website?
>
> -Marshall
>
> Julien Nioche wrote:
> > Hi,
> >
> > I contributed an annotator to the sandbox some time ago which uses Tika
> to
> > convert original markup into UIMA annotations. It does not seem to be
> listed
> > on the website but it should be in the SVN repository of the sandbox.
> >
> > Tika supports numerous formats such as PDF, XML, HTML etc...
> >
> > Julien
> >
> >
>

Re: document structure

Posted by Marshall Schor <ms...@schor.com>.

Hi Julien,

Can you write up a little something and submit a patch to the website?

-Marshall

Julien Nioche wrote:
> Hi,
>
> I contributed an annotator to the sandbox some time ago which uses Tika to
> convert original markup into UIMA annotations. It does not seem to be listed
> on the website but it should be in the SVN repository of the sandbox.
>
> Tika supports numerous formats such as PDF, XML, HTML etc...
>
> Julien
>
>

Re: document structure (was: Discussion of next UIMA release)

Posted by Julien Nioche <li...@gmail.com>.

Hi,

I contributed an annotator to the sandbox some time ago which uses Tika to
convert original markup into UIMA annotations. It does not seem to be listed
on the website but it should be in the SVN repository of the sandbox.

Tika supports numerous formats such as PDF, XML, HTML etc...

Julien

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2009/5/21 Greg Holmberg <ho...@comcast.net>

> On Tue, 19 May 2009 15:04:28 -0700, Eddie Epstein <ea...@gmail.com>
> wrote:
>
>> Since your original proposal back in 2007 there has been a growing
>> effort to add annotators to the project. Do you have any components
>> that use the proposed document model type system, say a collection
>> reader, that you would be willing to submit?
>>
>
> I did use this schema in a prototype.  I used the Stax parser to convert
> XML to this annotation structure over plain text.  Since the proposed schema
> losses no XML information, the XML can be reproduced from the CAS, if
> desired. Not byte-for-byte, since carriage ruturns may come out differently,
> but certainly functionally equivalent XML.
>
> HTML was first cleaned up with HTMLCleaner, converted to XML (XHTML), and
> then sent through the Stax parser and into the CAS.
>
> For other formats, I used a commercial filtering product to convert PDF,
> Office, etc. to HTML, and then through the above process.
>
> An open-source solution to filtering binary formats could use Aperture to
> produce XML+RDF, and then through the above process.
>
> The annotators I used didn't understand the CAS, only HTML, so I had to
> keep that in addition to the CAS to feed to those annotators.  The offsets
> these annotators returned were then relative to the HTML, so I kept a map of
> offset ranges between the HTML and the plain-text in the CAS.  This let me
> translate the offsets returned from the annotators against the HTML into
> offsets against the CAS, so when I created annotations they pointed to the
> right place.
>
> So I can't contribute the commercial filter code (we don't have source code
> anyway).  I may be able to contribute the XML and HTML converters, since
> that code was never shipped as a product. However, it will require approval
> from some EVP three levels above me.  I will look into it, but don't hold
> your breath.
>
>
>
> Greg Holmberg
>

Re: document structure (was: Discussion of next UIMA release)

Posted by Greg Holmberg <ho...@comcast.net>.

On Tue, 19 May 2009 15:04:28 -0700, Eddie Epstein <ea...@gmail.com>  
wrote:
> Since your original proposal back in 2007 there has been a growing
> effort to add annotators to the project. Do you have any components
> that use the proposed document model type system, say a collection
> reader, that you would be willing to submit?

I did use this schema in a prototype.  I used the Stax parser to convert  
XML to this annotation structure over plain text.  Since the proposed  
schema losses no XML information, the XML can be reproduced from the CAS,  
if desired. Not byte-for-byte, since carriage ruturns may come out  
differently, but certainly functionally equivalent XML.

HTML was first cleaned up with HTMLCleaner, converted to XML (XHTML), and  
then sent through the Stax parser and into the CAS.

For other formats, I used a commercial filtering product to convert PDF,  
Office, etc. to HTML, and then through the above process.

An open-source solution to filtering binary formats could use Aperture to  
produce XML+RDF, and then through the above process.

The annotators I used didn't understand the CAS, only HTML, so I had to  
keep that in addition to the CAS to feed to those annotators.  The offsets  
these annotators returned were then relative to the HTML, so I kept a map  
of offset ranges between the HTML and the plain-text in the CAS.  This let  
me translate the offsets returned from the annotators against the HTML  
into offsets against the CAS, so when I created annotations they pointed  
to the right place.

So I can't contribute the commercial filter code (we don't have source  
code anyway).  I may be able to contribute the XML and HTML converters,  
since that code was never shipped as a product. However, it will require  
approval from some EVP three levels above me.  I will look into it, but  
don't hold your breath.

Greg Holmberg

Re: document structure (was: Discussion of next UIMA release)

Posted by Eddie Epstein <ea...@gmail.com>.

Hi Greg,

Since your original proposal back in 2007 there has been a growing
effort to add annotators to the project. Do you have any components
that use the proposed document model type system, say a collection
reader, that you would be willing to submit?

Regards,
Eddie

On Tue, May 19, 2009 at 1:02 PM, Greg Holmberg <ho...@comcast.net> wrote:
> I feel that the lack of any standard in UIMA regarding the structure of the
> document being analyzed (that is, beyond simply plain text) makes it pretty
> much impossible to combine annotators from different sources--one of the
> primary justifications of UIMA, in my opinion.
>
> I sketched a possible solution to this on the wiki
> (http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document
> model") back in 2007, but it didn't generate much interest.  There's also a
> proposal for document properties, beyond the simple
> SourceDocumentInformation class.
>
>

Re: document structure (was: Discussion of next UIMA release)

Posted by Greg Holmberg <ho...@comcast.net>.

Indeed, the structure is important to linguistic analysis.  For example,  
imagine you have a table with three cells, containing the text "1996",  
"Honda", and "Camry".  If the cells are properly treated as sentence or  
paragraph boundaries, then entity extraction would produce a year, a  
company, and a vehicle.  If the structure is striped and just the plain  
text is analyzed, then you get one entity, a vehicle, "1996 Honda Camry".   
Which is not exactly the same thing.

I feel that the lack of any standard in UIMA regarding the structure of  
the document being analyzed (that is, beyond simply plain text) makes it  
pretty much impossible to combine annotators from different sources--one  
of the primary justifications of UIMA, in my opinion.

I sketched a possible solution to this on the wiki  
(http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document  
model") back in 2007, but it didn't generate much interest.  There's also  
a proposal for document properties, beyond the simple  
SourceDocumentInformation class.

Greg Holmberg

On Tue, 19 May 2009 09:34:14 -0700, Manuel Fiorelli  
<ma...@gmail.com> wrote:
> I would like to see a well-established way to analyze semi-structured
> documents, such as (X)HTML pages. UIMA shouldn't provide its own
> parser, but at least a type system (like uima.cas) to represent a DOM
> Document within a CAS instance (the simplest solution is to represent
> element nodes as feature structures and text nodes as annotations on
> the plain text, but I suspect there are more convenient solutions).
>
> When the analysis function doesn't rely upon the document structure,
> there should be a way to skip most of the markup and iterate on the
> blocks. I think that we cannot work directly on the plain text, since
> the loss of information could lead to misinterpretations. For example,
> in the following fragment
>
> <p>First paragrapher</p><p>Second paragrapher</p>
>
> the plain text would be
>
> First paragrapherSecond paragrapher
>
> where "paragrapherSecond" is an error in the interpretation of the  
> document.
>
> Manuel Fiorelli

Re: Discussion of next UIMA release

Posted by Manuel Fiorelli <ma...@gmail.com>.

2009/5/12 Thilo Goetz <tw...@gmx.de>:
>  If there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know.  Maybe you have patch up your
> sleeve?

I would like to see a well-established way to analyze semi-structured
documents, such as (X)HTML pages. UIMA shouldn't provide its own
parser, but at least a type system (like uima.cas) to represent a DOM
Document within a CAS instance (the simplest solution is to represent
element nodes as feature structures and text nodes as annotations on
the plain text, but I suspect there are more convenient solutions).

When the analysis function doesn't rely upon the document structure,
there should be a way to skip most of the markup and iterate on the
blocks. I think that we cannot work directly on the plain text, since
the loss of information could lead to misinterpretations. For example,
in the following fragment

<p>First paragrapher</p><p>Second paragrapher</p>

the plain text would be

First paragrapherSecond paragrapher

where "paragrapherSecond" is an error in the interpretation of the document.

Manuel Fiorelli

Re: Discussion of next UIMA release

Posted by Eric Riebling <er...@cs.cmu.edu>.

Speaking of documentation, this just reminded me of the time I came
across the term 'delegate' in the documentation, and counted it's use
as dozens of times, but yet nowhere ever was it defined as to what a
delegate actually is.  Confusing to newbies and pros alike! :)

Tommaso Teofili wrote:
> In the next version I would like the Overview Documentation's
> *Entities*(unique Objects in the CAS with, possibly, many referencing
> Annotations
> pointing at them) to be available.
> I asked about that in a previous discussion so I consequently implemented
> Entities code by myself. If you think it can be useful I can help with that.
> Regards,
> Tommaso
> 
> 2009/5/14 Ahmed Ragheb <RA...@eg.ibm.com>
> 
>> Hi Thilo,
>>
>> I would think it is good if we can be able to define a list of possible
>> values for String valued configuration parameters. This will make it easier
>> for users to determine the possible values. Then, the component descriptor
>> editor can provide the list of values in a combo box and a user can select
>> the value of choice.
>>
>> Regards,
>> Ahmed
>>
>> Thilo Goetz <tw...@gmx.de> wrote on 12/05/2009 07:02:12 PM:
>>
>>> Thilo Goetz <tw...@gmx.de>
>>> 12/05/2009 07:02 PM
>>>
>>> Please respond to
>>> uima-user@incubator.apache.org
>>>
>>> To
>>>
>>> UIMA User <ui...@incubator.apache.org>
>>>
>>> cc
>>>
>>> Subject
>>>
>>> Discussion of next UIMA release
>>>
>>> Hi UIMA users,
>>>
>>> just a quick note to let you know that I've kicked off
>>> discussions about the next release on uima-dev.  If
>>> there's anything missing in UIMA that you'd *really* like
>>> to see in the next release, now would be a good time
>>> to let everybody know.  Maybe you have patch up your
>>> sleeve?
>>>
>>> Thanks,
>>> Thilo
>>
> 

-- 
Eric Riebling  NSH 4623,  LTI,   SCS,  CMU
412.268.9872   http://www.cs.cmu.edu/~er1k

Re: Discussion of next UIMA release

Posted by Tommaso Teofili <to...@gmail.com>.

In the next version I would like the Overview Documentation's
*Entities*(unique Objects in the CAS with, possibly, many referencing
Annotations
pointing at them) to be available.
I asked about that in a previous discussion so I consequently implemented
Entities code by myself. If you think it can be useful I can help with that.
Regards,
Tommaso

2009/5/14 Ahmed Ragheb <RA...@eg.ibm.com>

> Hi Thilo,
>
> I would think it is good if we can be able to define a list of possible
> values for String valued configuration parameters. This will make it easier
> for users to determine the possible values. Then, the component descriptor
> editor can provide the list of values in a combo box and a user can select
> the value of choice.
>
> Regards,
> Ahmed
>
> Thilo Goetz <tw...@gmx.de> wrote on 12/05/2009 07:02:12 PM:
>
> > Thilo Goetz <tw...@gmx.de>
> > 12/05/2009 07:02 PM
> >
> > Please respond to
> > uima-user@incubator.apache.org
> >
> > To
> >
> > UIMA User <ui...@incubator.apache.org>
> >
> > cc
> >
> > Subject
> >
> > Discussion of next UIMA release
> >
> > Hi UIMA users,
> >
> > just a quick note to let you know that I've kicked off
> > discussions about the next release on uima-dev.  If
> > there's anything missing in UIMA that you'd *really* like
> > to see in the next release, now would be a good time
> > to let everybody know.  Maybe you have patch up your
> > sleeve?
> >
> > Thanks,
> > Thilo
>
>

Re: Discussion of next UIMA release

Posted by Ahmed Ragheb <RA...@eg.ibm.com>.

Hi Thilo,

I would think it is good if we can be able to define a list of possible
values for String valued configuration parameters. This will make it easier
for users to determine the possible values. Then, the component descriptor
editor can provide the list of values in a combo box and a user can select
the value of choice.

Regards,
Ahmed

Thilo Goetz <tw...@gmx.de> wrote on 12/05/2009 07:02:12 PM:

> Thilo Goetz <tw...@gmx.de>
> 12/05/2009 07:02 PM
>
> Please respond to
> uima-user@incubator.apache.org
>
> To
>
> UIMA User <ui...@incubator.apache.org>
>
> cc
>
> Subject
>
> Discussion of next UIMA release
>
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev.  If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know.  Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo

Re: Discussion of next UIMA release

Posted by Yoshinobu Kano <ka...@is.s.u-tokyo.ac.jp>.

Hi,

>> * "Map" type of configuration parameter
>>
>
> Also needed that in the past.
>
> A type parameter could be nice for those who
> specify parameter mappings in the descriptor. Maybe thats
> the wrong way to do it since these mappings often apply to multiple
> AEs and not only to one. On the other side these are usually grouped in
> one AAE so maybe thats the right place for it.

What I meant was a pairwise multiple parameter,
which would be translated to e.g. LinkedHashMap - not a parameter over AEs.

-Yoshinobu
-- 
Yoshinobu Kano (Given/Family)
kano@is.s.u-tokyo.ac.jp
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/

Re: Discussion of next UIMA release

Posted by Jörn Kottmann <ko...@gmail.com>.

Yoshinobu Kano wrote:
> Hi Thilo,
>
> A couple of wishes...
>
> * generics in the UIMA API
>   
I am working on generics right now, will be in the next release.
For further details have a look at UIMA-1341.
> * "Map" type of configuration parameter
>   
Also needed that in the past.

A type parameter could be nice for those who
specify parameter mappings in the descriptor. Maybe thats
the wrong way to do it since these mappings often apply to multiple
AEs and not only to one. On the other side these are usually grouped in
one AAE so maybe thats the right place for it.

Jörn

Re: Discussion of next UIMA release

Posted by Yoshinobu Kano <ka...@is.s.u-tokyo.ac.jp>.

Hi Thilo,

>> * generics in the UIMA API
>
> Joern is working on that.  If you have particular suggestions, please
> join the discussion on the developer's list.
A good news, thanks! I has been recieving uima-users only, was not
aware of these movements.

>> * collection framework (or similar sort of thing)
>
> Yes, I've heard that many times.  The issue here is that this
> is bit difficult to do with the CAS.  The CAS was designed on
> purpose to have only very simple data structures.  This was so
> it would be maximally portable.  So I don't see how we could,
> for example, add sets to the CAS without breaking the design
> philosophy.  However, one might consider adding a whole host
> of utility functions to Java UIMA that allows users to treat
> an array as a set, a list, or whatever else you have in mind.
> Maps and trees could also be implemented like that.  I assume
> many people have done this themselves, and it would make a lot
> of sense to have this kind of functionality in the core.
>
> Let me know if this is what you had in mind.
Yes, I love to see this sort of functionalities.

As for the data structure in general,
IMHO UIMA official should provide, at least, very common data
structures like maps, trees, dags, etc.
It is very happy to see the UIMA user community spreads,
but I am quite afraid that everyone has one's own type system, results
in actually no compatibility.

>
>> * "Map" type of configuration parameter
>>
>> Probably not possible due to the architecture design...
>> * "memory leak" when hold CAS data outside CAS. GC is not enough?
>
> I don't know what you mean by that.
I meant this behaviour:
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.common_pitfalls

-Yoshinobu

>
> --Thilo
>
>>
>> It is very happy if we can see some of these in the next release.
>> Thank you very much for your efforts on the next release!
>>
>> -Yoshinobu
>>
>> On Wed, May 13, 2009 at 1:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>> Hi UIMA users,
>>>
>>> just a quick note to let you know that I've kicked off
>>> discussions about the next release on uima-dev.  If
>>> there's anything missing in UIMA that you'd *really* like
>>> to see in the next release, now would be a good time
>>> to let everybody know.  Maybe you have patch up your
>>> sleeve?
>>>
>>> Thanks,
>>> Thilo
>>>
>>
>>
>>
>
>



-- 
Yoshinobu Kano (Given/Family)
kano@is.s.u-tokyo.ac.jp
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/

Re: Discussion of next UIMA release

Posted by Thilo Goetz <tw...@gmx.de>.

Hi Yoshinobu,

Yoshinobu Kano wrote:
> Hi Thilo,
> 
> A couple of wishes...
> 
> * multiple inheritance in the type system, please!
> (I know this would be quite a large issue but...
> This is also required to be compliant with the UIMA spec, should be
> compatible with ECore)

yes, that would be a bit of a change.  I don't think
we should attempt that in a point release.  I would
also like to hear from more people who have this
requirement...

> 
> * generics in the UIMA API

Joern is working on that.  If you have particular suggestions, please
join the discussion on the developer's list.

> * collection framework (or similar sort of thing)

Yes, I've heard that many times.  The issue here is that this
is bit difficult to do with the CAS.  The CAS was designed on
purpose to have only very simple data structures.  This was so
it would be maximally portable.  So I don't see how we could,
for example, add sets to the CAS without breaking the design
philosophy.  However, one might consider adding a whole host
of utility functions to Java UIMA that allows users to treat
an array as a set, a list, or whatever else you have in mind.
Maps and trees could also be implemented like that.  I assume
many people have done this themselves, and it would make a lot
of sense to have this kind of functionality in the core.

Let me know if this is what you had in mind.

> * "Map" type of configuration parameter
> 
> Probably not possible due to the architecture design...
> * "memory leak" when hold CAS data outside CAS. GC is not enough?

I don't know what you mean by that.

--Thilo

> 
> It is very happy if we can see some of these in the next release.
> Thank you very much for your efforts on the next release!
> 
> -Yoshinobu
> 
> On Wed, May 13, 2009 at 1:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
>> Hi UIMA users,
>>
>> just a quick note to let you know that I've kicked off
>> discussions about the next release on uima-dev.  If
>> there's anything missing in UIMA that you'd *really* like
>> to see in the next release, now would be a good time
>> to let everybody know.  Maybe you have patch up your
>> sleeve?
>>
>> Thanks,
>> Thilo
>>
> 
> 
>

Re: Discussion of next UIMA release

Posted by Yoshinobu Kano <ka...@is.s.u-tokyo.ac.jp>.

Hi Thilo,

A couple of wishes...

* multiple inheritance in the type system, please!
(I know this would be quite a large issue but...
This is also required to be compliant with the UIMA spec, should be
compatible with ECore)

* generics in the UIMA API
* collection framework (or similar sort of thing)
* "Map" type of configuration parameter

Probably not possible due to the architecture design...
* "memory leak" when hold CAS data outside CAS. GC is not enough?

It is very happy if we can see some of these in the next release.
Thank you very much for your efforts on the next release!

-Yoshinobu

On Wed, May 13, 2009 at 1:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev.  If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know.  Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo
>

-- 
Yoshinobu Kano (Given/Family)
kano@is.s.u-tokyo.ac.jp
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/

Re: Discussion of next UIMA release

Posted by Detmar Meurers <dm...@sfs.uni-tuebingen.de>.

Hi Thilo,

    just a quick note to let you know that I've kicked off
    discussions about the next release on uima-dev.  If
    there's anything missing in UIMA that you'd *really* like
    to see in the next release, now would be a good time
    to let everybody know.  Maybe you have patch up your
    sleeve?
    
I'd like to see unification of ca-structures as a built-in - combining
information is so fundamental a process that it would be great to see
it supported.

Best,
Detmar

Re: Discussion of next UIMA release

Posted by Matthias Wendt <ma...@neofonie.de>.

Hello,

uimacpp currently has the restriction of only allowing one Annotator per 
shared object file. Is it possible to relax this restriction in the next 
release? In the descriptor <annotatorImplementationName/> for Java is 
the name of the implementing class, while for C++ it is the name of the 
shared object (or dll) file. I wonder if there might be another way of 
telling the framework which .so it is to load.?

Regards
Matthias

Thilo Goetz schrieb:
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev.  If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know.  Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo
>

Re: Discussion of next UIMA release

Posted by Steven Bethard <st...@gmail.com>.

On Tue, May 12, 2009 at 9:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev.  If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know.  Maybe you have patch up your
> sleeve?

I know this is probably a long shot, but I'd like to see the
programmatic API behind the descriptors documented. The APIs are
accessible now through all the .impl packages, but there currently
isn't any documentation for it that I know of, presumably because the
assumption is that everyone will want to use XML instead of Java code.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy