You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Thilo Goetz <tw...@gmx.de> on 2009/05/12 18:02:12 UTC
Discussion of next UIMA release
Hi UIMA users,
just a quick note to let you know that I've kicked off
discussions about the next release on uima-dev. If
there's anything missing in UIMA that you'd *really* like
to see in the next release, now would be a good time
to let everybody know. Maybe you have patch up your
sleeve?
Thanks,
Thilo
Re: Discussion of next UIMA release
Posted by Peter Klügl <pk...@ki.informatik.uni-wuerzburg.de>.
Hi Thilo,
that's maybe the right time to mention the subIterator() method
independent of the type priorities.
Thanks
Peter
Thilo Goetz schrieb:
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev. If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know. Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo
>
Re: Discussion of next UIMA release
Posted by Tommaso Teofili <to...@gmail.com>.
2009/5/19 Manuel Fiorelli <ma...@gmail.com>
> I would like to see a well-established way to analyze semi-structured
> documents, such as (X)HTML pages. UIMA shouldn't provide its own
> parser, but at least a type system (like uima.cas) to represent a DOM
> Document within a CAS instance (the simplest solution is to represent
> element nodes as feature structures and text nodes as annotations on
> the plain text, but I suspect there are more convenient solutions).
>
I do agree with this.
Tommaso
Re: document structure (was: Discussion of next UIMA release)
Posted by Greg Holmberg <ho...@comcast.net>.
On Tue, 19 May 2009 11:04:49 -0700, Manuel Fiorelli
<ma...@gmail.com> wrote:
> I'm happy to see I am not the only who feels this feature to be
> useful. I saw that in your model, every node is an annotation, which
> is fine to easily implement the property "textContent", which returns
> the text contained in an Element.
>
> Also the support for pdf (and other document formats) would be an
> important addition...
>
> Manuel Fiorelli
For PDF filtering, check out this open-source project:
http://aperture.sourceforge.net
This handles PDF, HTML, XML, RTF, Office, OpenOffice, Corel, email, ical.
It also provides crawlers. It's built on other open-source libraries,
such as POI and PDFBox, but adds the ability to produce XML with RDF
elements. The RDF could be represented in the document model I proposed.
Greg
Re: document structure (was: Discussion of next UIMA release)
Posted by Manuel Fiorelli <ma...@gmail.com>.
2009/5/19 Greg Holmberg <ho...@comcast.net>:
> I sketched a possible solution to this on the wiki
> (http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document
> model") back in 2007, but it didn't generate much interest. There's also a
> proposal for document properties, beyond the simple
> SourceDocumentInformation class.
I'm happy to see I am not the only who feels this feature to be
useful. I saw that in your model, every node is an annotation, which
is fine to easily implement the property "textContent", which returns
the text contained in an Element.
Also the support for pdf (and other document formats) would be an
important addition...
Manuel Fiorelli
Re: document structure (was: Discussion of next UIMA release)
Posted by Kameron Cole <ka...@us.ibm.com>.
How would DITA play into this? It seems to me that whether the community
adopts it or not, DITA is a de facto standard for document structure.
Further, I clearly see millions of applications for text analytics and
DITA.
** ** ** **
Kameron Arthur Cole
Senior IT Specialist, Managing Consultant
IBM Information Management Lab Services.
kameroncole@us.ibm.com
home office: 305-831-4058 / mobile office: 305.905.4112 / fax: 845.491.4052
ECM Lab Services Mission:
To provide fee-based services and ECM centric solutions around our products
with profitable delivery, high customer satisfaction and rapid ROI
realization.
Information Clearing House for OmniFind (my blog)
Worldwide Discovery (OmniFind) Tech SalesWiki
IBM Enterprise Content Management
Eddie Epstein
<eaepstein@gmail.
com> To
uima-user@incubator.apache.org
05/19/2009 06:04 cc
PM
Subject
Re: document structure (was:
Please respond to Discussion of next UIMA release)
uima-user@incubat
or.apache.org
Hi Greg,
Since your original proposal back in 2007 there has been a growing
effort to add annotators to the project. Do you have any components
that use the proposed document model type system, say a collection
reader, that you would be willing to submit?
Regards,
Eddie
On Tue, May 19, 2009 at 1:02 PM, Greg Holmberg <ho...@comcast.net>
wrote:
> I feel that the lack of any standard in UIMA regarding the structure of
the
> document being analyzed (that is, beyond simply plain text) makes it
pretty
> much impossible to combine annotators from different sources--one of the
> primary justifications of UIMA, in my opinion.
>
> I sketched a possible solution to this on the wiki
> (http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document
> model") back in 2007, but it didn't generate much interest. There's also
a
> proposal for document properties, beyond the simple
> SourceDocumentInformation class.
>
>
Re: TikaAnnotator (was: document structure)
Posted by Tong Fin <to...@gmail.com>.
Since we have some users using this project, it maybe a good candidate for
graduation from sandbox.
Opinions ?
-- Tong
On Fri, May 22, 2009 at 3:58 AM, Jörn Kottmann <ko...@gmail.com> wrote:
> Julien Nioche wrote:
>
>> Hi,
>>
>> I contributed an annotator to the sandbox some time ago which uses Tika to
>> convert original markup into UIMA annotations. It does not seem to be
>> listed
>> on the website but it should be in the SVN repository of the sandbox.
>>
>> Tika supports numerous formats such as PDF, XML, HTML
>>
> I checked in the code 4 months ago. Please have a look at it to make
> sure everything is as intended.
>
> Here is the svn link:
> http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/TikaAnnotator/
>
> Jörn
>
Re: TikaAnnotator (was: document structure)
Posted by Jörn Kottmann <ko...@gmail.com>.
Julien Nioche wrote:
> Hi,
>
> I contributed an annotator to the sandbox some time ago which uses Tika to
> convert original markup into UIMA annotations. It does not seem to be listed
> on the website but it should be in the SVN repository of the sandbox.
>
> Tika supports numerous formats such as PDF, XML, HTML
I checked in the code 4 months ago. Please have a look at it to make
sure everything is as intended.
Here is the svn link:
http://svn.apache.org/viewvc/incubator/uima/sandbox/trunk/TikaAnnotator/
Jörn
Re: document structure
Posted by Marshall Schor <ms...@schor.com>.
I updated the UIMA website's sandbox page with this information.
-Marshall
Julien Nioche wrote:
> Hi Marshall,
>
> There is a description in the README.txt file from the TikaAnnotator
> repository, which I have slightly rewritten into the text below.
>
>
> *Apache Tika is a toolkit for detecting and extracting metadata and
> structured text content from various documents using existing parser
> libraries. The TikaAnnotator uses Tika to generate annotations representing
> the original markup of a document, extract its text and metadata. It
> consists of three resources :
>
> - FileSystemCollectionReader : similar to the one in UIMA examples but uses
> TIKA to extract the text from binary documents and generates annotations to
> represent the markup
>
> - MarkupAnnotator : takes the original content from a view and generates a
> new view containing the extracted text with markup annotations
>
> - TikaWrapper : utility class which allows to populate a CAS from a binary
> document; used by the FileSystemCollectionReader *
>
>
> Best,
>
> J.
>
>
Re: document structure
Posted by Julien Nioche <li...@gmail.com>.
Hi Marshall,
There is a description in the README.txt file from the TikaAnnotator
repository, which I have slightly rewritten into the text below.
*Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries. The TikaAnnotator uses Tika to generate annotations representing
the original markup of a document, extract its text and metadata. It
consists of three resources :
- FileSystemCollectionReader : similar to the one in UIMA examples but uses
TIKA to extract the text from binary documents and generates annotations to
represent the markup
- MarkupAnnotator : takes the original content from a view and generates a
new view containing the extracted text with markup annotations
- TikaWrapper : utility class which allows to populate a CAS from a binary
document; used by the FileSystemCollectionReader *
Best,
J.
--
DigitalPebble Ltd
http://www.digitalpebble.com
2009/5/22 Marshall Schor <ms...@schor.com>
> Hi Julien,
>
> Can you write up a little something and submit a patch to the website?
>
> -Marshall
>
> Julien Nioche wrote:
> > Hi,
> >
> > I contributed an annotator to the sandbox some time ago which uses Tika
> to
> > convert original markup into UIMA annotations. It does not seem to be
> listed
> > on the website but it should be in the SVN repository of the sandbox.
> >
> > Tika supports numerous formats such as PDF, XML, HTML etc...
> >
> > Julien
> >
> >
>
Re: document structure
Posted by Marshall Schor <ms...@schor.com>.
Hi Julien,
Can you write up a little something and submit a patch to the website?
-Marshall
Julien Nioche wrote:
> Hi,
>
> I contributed an annotator to the sandbox some time ago which uses Tika to
> convert original markup into UIMA annotations. It does not seem to be listed
> on the website but it should be in the SVN repository of the sandbox.
>
> Tika supports numerous formats such as PDF, XML, HTML etc...
>
> Julien
>
>
Re: document structure (was: Discussion of next UIMA release)
Posted by Julien Nioche <li...@gmail.com>.
Hi,
I contributed an annotator to the sandbox some time ago which uses Tika to
convert original markup into UIMA annotations. It does not seem to be listed
on the website but it should be in the SVN repository of the sandbox.
Tika supports numerous formats such as PDF, XML, HTML etc...
Julien
--
DigitalPebble Ltd
http://www.digitalpebble.com
2009/5/21 Greg Holmberg <ho...@comcast.net>
> On Tue, 19 May 2009 15:04:28 -0700, Eddie Epstein <ea...@gmail.com>
> wrote:
>
>> Since your original proposal back in 2007 there has been a growing
>> effort to add annotators to the project. Do you have any components
>> that use the proposed document model type system, say a collection
>> reader, that you would be willing to submit?
>>
>
> I did use this schema in a prototype. I used the Stax parser to convert
> XML to this annotation structure over plain text. Since the proposed schema
> losses no XML information, the XML can be reproduced from the CAS, if
> desired. Not byte-for-byte, since carriage ruturns may come out differently,
> but certainly functionally equivalent XML.
>
> HTML was first cleaned up with HTMLCleaner, converted to XML (XHTML), and
> then sent through the Stax parser and into the CAS.
>
> For other formats, I used a commercial filtering product to convert PDF,
> Office, etc. to HTML, and then through the above process.
>
> An open-source solution to filtering binary formats could use Aperture to
> produce XML+RDF, and then through the above process.
>
> The annotators I used didn't understand the CAS, only HTML, so I had to
> keep that in addition to the CAS to feed to those annotators. The offsets
> these annotators returned were then relative to the HTML, so I kept a map of
> offset ranges between the HTML and the plain-text in the CAS. This let me
> translate the offsets returned from the annotators against the HTML into
> offsets against the CAS, so when I created annotations they pointed to the
> right place.
>
> So I can't contribute the commercial filter code (we don't have source code
> anyway). I may be able to contribute the XML and HTML converters, since
> that code was never shipped as a product. However, it will require approval
> from some EVP three levels above me. I will look into it, but don't hold
> your breath.
>
>
>
> Greg Holmberg
>
Re: document structure (was: Discussion of next UIMA release)
Posted by Greg Holmberg <ho...@comcast.net>.
On Tue, 19 May 2009 15:04:28 -0700, Eddie Epstein <ea...@gmail.com>
wrote:
> Since your original proposal back in 2007 there has been a growing
> effort to add annotators to the project. Do you have any components
> that use the proposed document model type system, say a collection
> reader, that you would be willing to submit?
I did use this schema in a prototype. I used the Stax parser to convert
XML to this annotation structure over plain text. Since the proposed
schema losses no XML information, the XML can be reproduced from the CAS,
if desired. Not byte-for-byte, since carriage ruturns may come out
differently, but certainly functionally equivalent XML.
HTML was first cleaned up with HTMLCleaner, converted to XML (XHTML), and
then sent through the Stax parser and into the CAS.
For other formats, I used a commercial filtering product to convert PDF,
Office, etc. to HTML, and then through the above process.
An open-source solution to filtering binary formats could use Aperture to
produce XML+RDF, and then through the above process.
The annotators I used didn't understand the CAS, only HTML, so I had to
keep that in addition to the CAS to feed to those annotators. The offsets
these annotators returned were then relative to the HTML, so I kept a map
of offset ranges between the HTML and the plain-text in the CAS. This let
me translate the offsets returned from the annotators against the HTML
into offsets against the CAS, so when I created annotations they pointed
to the right place.
So I can't contribute the commercial filter code (we don't have source
code anyway). I may be able to contribute the XML and HTML converters,
since that code was never shipped as a product. However, it will require
approval from some EVP three levels above me. I will look into it, but
don't hold your breath.
Greg Holmberg
Re: document structure (was: Discussion of next UIMA release)
Posted by Eddie Epstein <ea...@gmail.com>.
Hi Greg,
Since your original proposal back in 2007 there has been a growing
effort to add annotators to the project. Do you have any components
that use the proposed document model type system, say a collection
reader, that you would be willing to submit?
Regards,
Eddie
On Tue, May 19, 2009 at 1:02 PM, Greg Holmberg <ho...@comcast.net> wrote:
> I feel that the lack of any standard in UIMA regarding the structure of the
> document being analyzed (that is, beyond simply plain text) makes it pretty
> much impossible to combine annotators from different sources--one of the
> primary justifications of UIMA, in my opinion.
>
> I sketched a possible solution to this on the wiki
> (http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document
> model") back in 2007, but it didn't generate much interest. There's also a
> proposal for document properties, beyond the simple
> SourceDocumentInformation class.
>
>
Re: document structure (was: Discussion of next UIMA release)
Posted by Greg Holmberg <ho...@comcast.net>.
Indeed, the structure is important to linguistic analysis. For example,
imagine you have a table with three cells, containing the text "1996",
"Honda", and "Camry". If the cells are properly treated as sentence or
paragraph boundaries, then entity extraction would produce a year, a
company, and a vehicle. If the structure is striped and just the plain
text is analyzed, then you get one entity, a vehicle, "1996 Honda Camry".
Which is not exactly the same thing.
I feel that the lack of any standard in UIMA regarding the structure of
the document being analyzed (that is, beyond simply plain text) makes it
pretty much impossible to combine annotators from different sources--one
of the primary justifications of UIMA, in my opinion.
I sketched a possible solution to this on the wiki
(http://cwiki.apache.org/UIMA/uima-sandbox-components.html, see "Document
model") back in 2007, but it didn't generate much interest. There's also
a proposal for document properties, beyond the simple
SourceDocumentInformation class.
Greg Holmberg
On Tue, 19 May 2009 09:34:14 -0700, Manuel Fiorelli
<ma...@gmail.com> wrote:
> I would like to see a well-established way to analyze semi-structured
> documents, such as (X)HTML pages. UIMA shouldn't provide its own
> parser, but at least a type system (like uima.cas) to represent a DOM
> Document within a CAS instance (the simplest solution is to represent
> element nodes as feature structures and text nodes as annotations on
> the plain text, but I suspect there are more convenient solutions).
>
> When the analysis function doesn't rely upon the document structure,
> there should be a way to skip most of the markup and iterate on the
> blocks. I think that we cannot work directly on the plain text, since
> the loss of information could lead to misinterpretations. For example,
> in the following fragment
>
> <p>First paragrapher</p><p>Second paragrapher</p>
>
> the plain text would be
>
> First paragrapherSecond paragrapher
>
> where "paragrapherSecond" is an error in the interpretation of the
> document.
>
> Manuel Fiorelli
Re: Discussion of next UIMA release
Posted by Manuel Fiorelli <ma...@gmail.com>.
2009/5/12 Thilo Goetz <tw...@gmx.de>:
> If there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know. Maybe you have patch up your
> sleeve?
I would like to see a well-established way to analyze semi-structured
documents, such as (X)HTML pages. UIMA shouldn't provide its own
parser, but at least a type system (like uima.cas) to represent a DOM
Document within a CAS instance (the simplest solution is to represent
element nodes as feature structures and text nodes as annotations on
the plain text, but I suspect there are more convenient solutions).
When the analysis function doesn't rely upon the document structure,
there should be a way to skip most of the markup and iterate on the
blocks. I think that we cannot work directly on the plain text, since
the loss of information could lead to misinterpretations. For example,
in the following fragment
<p>First paragrapher</p><p>Second paragrapher</p>
the plain text would be
First paragrapherSecond paragrapher
where "paragrapherSecond" is an error in the interpretation of the document.
Manuel Fiorelli
Re: Discussion of next UIMA release
Posted by Eric Riebling <er...@cs.cmu.edu>.
Speaking of documentation, this just reminded me of the time I came
across the term 'delegate' in the documentation, and counted it's use
as dozens of times, but yet nowhere ever was it defined as to what a
delegate actually is. Confusing to newbies and pros alike! :)
Tommaso Teofili wrote:
> In the next version I would like the Overview Documentation's
> *Entities*(unique Objects in the CAS with, possibly, many referencing
> Annotations
> pointing at them) to be available.
> I asked about that in a previous discussion so I consequently implemented
> Entities code by myself. If you think it can be useful I can help with that.
> Regards,
> Tommaso
>
> 2009/5/14 Ahmed Ragheb <RA...@eg.ibm.com>
>
>> Hi Thilo,
>>
>> I would think it is good if we can be able to define a list of possible
>> values for String valued configuration parameters. This will make it easier
>> for users to determine the possible values. Then, the component descriptor
>> editor can provide the list of values in a combo box and a user can select
>> the value of choice.
>>
>> Regards,
>> Ahmed
>>
>> Thilo Goetz <tw...@gmx.de> wrote on 12/05/2009 07:02:12 PM:
>>
>>> Thilo Goetz <tw...@gmx.de>
>>> 12/05/2009 07:02 PM
>>>
>>> Please respond to
>>> uima-user@incubator.apache.org
>>>
>>> To
>>>
>>> UIMA User <ui...@incubator.apache.org>
>>>
>>> cc
>>>
>>> Subject
>>>
>>> Discussion of next UIMA release
>>>
>>> Hi UIMA users,
>>>
>>> just a quick note to let you know that I've kicked off
>>> discussions about the next release on uima-dev. If
>>> there's anything missing in UIMA that you'd *really* like
>>> to see in the next release, now would be a good time
>>> to let everybody know. Maybe you have patch up your
>>> sleeve?
>>>
>>> Thanks,
>>> Thilo
>>
>
--
Eric Riebling NSH 4623, LTI, SCS, CMU
412.268.9872 http://www.cs.cmu.edu/~er1k
Re: Discussion of next UIMA release
Posted by Tommaso Teofili <to...@gmail.com>.
In the next version I would like the Overview Documentation's
*Entities*(unique Objects in the CAS with, possibly, many referencing
Annotations
pointing at them) to be available.
I asked about that in a previous discussion so I consequently implemented
Entities code by myself. If you think it can be useful I can help with that.
Regards,
Tommaso
2009/5/14 Ahmed Ragheb <RA...@eg.ibm.com>
> Hi Thilo,
>
> I would think it is good if we can be able to define a list of possible
> values for String valued configuration parameters. This will make it easier
> for users to determine the possible values. Then, the component descriptor
> editor can provide the list of values in a combo box and a user can select
> the value of choice.
>
> Regards,
> Ahmed
>
> Thilo Goetz <tw...@gmx.de> wrote on 12/05/2009 07:02:12 PM:
>
> > Thilo Goetz <tw...@gmx.de>
> > 12/05/2009 07:02 PM
> >
> > Please respond to
> > uima-user@incubator.apache.org
> >
> > To
> >
> > UIMA User <ui...@incubator.apache.org>
> >
> > cc
> >
> > Subject
> >
> > Discussion of next UIMA release
> >
> > Hi UIMA users,
> >
> > just a quick note to let you know that I've kicked off
> > discussions about the next release on uima-dev. If
> > there's anything missing in UIMA that you'd *really* like
> > to see in the next release, now would be a good time
> > to let everybody know. Maybe you have patch up your
> > sleeve?
> >
> > Thanks,
> > Thilo
>
>
Re: Discussion of next UIMA release
Posted by Ahmed Ragheb <RA...@eg.ibm.com>.
Hi Thilo,
I would think it is good if we can be able to define a list of possible
values for String valued configuration parameters. This will make it easier
for users to determine the possible values. Then, the component descriptor
editor can provide the list of values in a combo box and a user can select
the value of choice.
Regards,
Ahmed
Thilo Goetz <tw...@gmx.de> wrote on 12/05/2009 07:02:12 PM:
> Thilo Goetz <tw...@gmx.de>
> 12/05/2009 07:02 PM
>
> Please respond to
> uima-user@incubator.apache.org
>
> To
>
> UIMA User <ui...@incubator.apache.org>
>
> cc
>
> Subject
>
> Discussion of next UIMA release
>
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev. If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know. Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo
Re: Discussion of next UIMA release
Posted by Yoshinobu Kano <ka...@is.s.u-tokyo.ac.jp>.
Hi,
>> * "Map" type of configuration parameter
>>
>
> Also needed that in the past.
>
> A type parameter could be nice for those who
> specify parameter mappings in the descriptor. Maybe thats
> the wrong way to do it since these mappings often apply to multiple
> AEs and not only to one. On the other side these are usually grouped in
> one AAE so maybe thats the right place for it.
What I meant was a pairwise multiple parameter,
which would be translated to e.g. LinkedHashMap - not a parameter over AEs.
-Yoshinobu
--
Yoshinobu Kano (Given/Family)
kano@is.s.u-tokyo.ac.jp
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/
Re: Discussion of next UIMA release
Posted by Jörn Kottmann <ko...@gmail.com>.
Yoshinobu Kano wrote:
> Hi Thilo,
>
> A couple of wishes...
>
> * generics in the UIMA API
>
I am working on generics right now, will be in the next release.
For further details have a look at UIMA-1341.
> * "Map" type of configuration parameter
>
Also needed that in the past.
A type parameter could be nice for those who
specify parameter mappings in the descriptor. Maybe thats
the wrong way to do it since these mappings often apply to multiple
AEs and not only to one. On the other side these are usually grouped in
one AAE so maybe thats the right place for it.
Jörn
Re: Discussion of next UIMA release
Posted by Yoshinobu Kano <ka...@is.s.u-tokyo.ac.jp>.
Hi Thilo,
>> * generics in the UIMA API
>
> Joern is working on that. If you have particular suggestions, please
> join the discussion on the developer's list.
A good news, thanks! I has been recieving uima-users only, was not
aware of these movements.
>> * collection framework (or similar sort of thing)
>
> Yes, I've heard that many times. The issue here is that this
> is bit difficult to do with the CAS. The CAS was designed on
> purpose to have only very simple data structures. This was so
> it would be maximally portable. So I don't see how we could,
> for example, add sets to the CAS without breaking the design
> philosophy. However, one might consider adding a whole host
> of utility functions to Java UIMA that allows users to treat
> an array as a set, a list, or whatever else you have in mind.
> Maps and trees could also be implemented like that. I assume
> many people have done this themselves, and it would make a lot
> of sense to have this kind of functionality in the core.
>
> Let me know if this is what you had in mind.
Yes, I love to see this sort of functionalities.
As for the data structure in general,
IMHO UIMA official should provide, at least, very common data
structures like maps, trees, dags, etc.
It is very happy to see the UIMA user community spreads,
but I am quite afraid that everyone has one's own type system, results
in actually no compatibility.
>
>> * "Map" type of configuration parameter
>>
>> Probably not possible due to the architecture design...
>> * "memory leak" when hold CAS data outside CAS. GC is not enough?
>
> I don't know what you mean by that.
I meant this behaviour:
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.common_pitfalls
-Yoshinobu
>
> --Thilo
>
>>
>> It is very happy if we can see some of these in the next release.
>> Thank you very much for your efforts on the next release!
>>
>> -Yoshinobu
>>
>> On Wed, May 13, 2009 at 1:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>> Hi UIMA users,
>>>
>>> just a quick note to let you know that I've kicked off
>>> discussions about the next release on uima-dev. If
>>> there's anything missing in UIMA that you'd *really* like
>>> to see in the next release, now would be a good time
>>> to let everybody know. Maybe you have patch up your
>>> sleeve?
>>>
>>> Thanks,
>>> Thilo
>>>
>>
>>
>>
>
>
--
Yoshinobu Kano (Given/Family)
kano@is.s.u-tokyo.ac.jp
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/
Re: Discussion of next UIMA release
Posted by Thilo Goetz <tw...@gmx.de>.
Hi Yoshinobu,
Yoshinobu Kano wrote:
> Hi Thilo,
>
> A couple of wishes...
>
> * multiple inheritance in the type system, please!
> (I know this would be quite a large issue but...
> This is also required to be compliant with the UIMA spec, should be
> compatible with ECore)
yes, that would be a bit of a change. I don't think
we should attempt that in a point release. I would
also like to hear from more people who have this
requirement...
>
> * generics in the UIMA API
Joern is working on that. If you have particular suggestions, please
join the discussion on the developer's list.
> * collection framework (or similar sort of thing)
Yes, I've heard that many times. The issue here is that this
is bit difficult to do with the CAS. The CAS was designed on
purpose to have only very simple data structures. This was so
it would be maximally portable. So I don't see how we could,
for example, add sets to the CAS without breaking the design
philosophy. However, one might consider adding a whole host
of utility functions to Java UIMA that allows users to treat
an array as a set, a list, or whatever else you have in mind.
Maps and trees could also be implemented like that. I assume
many people have done this themselves, and it would make a lot
of sense to have this kind of functionality in the core.
Let me know if this is what you had in mind.
> * "Map" type of configuration parameter
>
> Probably not possible due to the architecture design...
> * "memory leak" when hold CAS data outside CAS. GC is not enough?
I don't know what you mean by that.
--Thilo
>
> It is very happy if we can see some of these in the next release.
> Thank you very much for your efforts on the next release!
>
> -Yoshinobu
>
> On Wed, May 13, 2009 at 1:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
>> Hi UIMA users,
>>
>> just a quick note to let you know that I've kicked off
>> discussions about the next release on uima-dev. If
>> there's anything missing in UIMA that you'd *really* like
>> to see in the next release, now would be a good time
>> to let everybody know. Maybe you have patch up your
>> sleeve?
>>
>> Thanks,
>> Thilo
>>
>
>
>
Re: Discussion of next UIMA release
Posted by Yoshinobu Kano <ka...@is.s.u-tokyo.ac.jp>.
Hi Thilo,
A couple of wishes...
* multiple inheritance in the type system, please!
(I know this would be quite a large issue but...
This is also required to be compliant with the UIMA spec, should be
compatible with ECore)
* generics in the UIMA API
* collection framework (or similar sort of thing)
* "Map" type of configuration parameter
Probably not possible due to the architecture design...
* "memory leak" when hold CAS data outside CAS. GC is not enough?
It is very happy if we can see some of these in the next release.
Thank you very much for your efforts on the next release!
-Yoshinobu
On Wed, May 13, 2009 at 1:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev. If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know. Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo
>
--
Yoshinobu Kano (Given/Family)
kano@is.s.u-tokyo.ac.jp
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/
Re: Discussion of next UIMA release
Posted by Detmar Meurers <dm...@sfs.uni-tuebingen.de>.
Hi Thilo,
just a quick note to let you know that I've kicked off
discussions about the next release on uima-dev. If
there's anything missing in UIMA that you'd *really* like
to see in the next release, now would be a good time
to let everybody know. Maybe you have patch up your
sleeve?
I'd like to see unification of ca-structures as a built-in - combining
information is so fundamental a process that it would be great to see
it supported.
Best,
Detmar
Re: Discussion of next UIMA release
Posted by Matthias Wendt <ma...@neofonie.de>.
Hello,
uimacpp currently has the restriction of only allowing one Annotator per
shared object file. Is it possible to relax this restriction in the next
release? In the descriptor <annotatorImplementationName/> for Java is
the name of the implementing class, while for C++ it is the name of the
shared object (or dll) file. I wonder if there might be another way of
telling the framework which .so it is to load.?
Regards
Matthias
Thilo Goetz schrieb:
> Hi UIMA users,
>
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev. If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know. Maybe you have patch up your
> sleeve?
>
> Thanks,
> Thilo
>
Re: Discussion of next UIMA release
Posted by Steven Bethard <st...@gmail.com>.
On Tue, May 12, 2009 at 9:02 AM, Thilo Goetz <tw...@gmx.de> wrote:
> just a quick note to let you know that I've kicked off
> discussions about the next release on uima-dev. If
> there's anything missing in UIMA that you'd *really* like
> to see in the next release, now would be a good time
> to let everybody know. Maybe you have patch up your
> sleeve?
I know this is probably a long shot, but I'd like to see the
programmatic API behind the descriptors documented. The APIs are
accessible now through all the .impl packages, but there currently
isn't any documentation for it that I know of, presumably because the
assumption is that everyone will want to use XML instead of Java code.
Steve
--
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
--- Bucky Katt, Get Fuzzy