You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Peter Klügl <pk...@uni-wuerzburg.de> on 2010/12/14 15:55:28 UTC
TextMarker
Hello,
We would like to contribute our TextMarker system to Apache UIMA and
want to ask, if the development team is interested in this contribution.
The system is currently hosted on SourceForge
(http://sourceforge.net/projects/textmarker/) and there is some
documentation in the project wiki
(http://tmwiki.informatik.uni-wuerzburg.de/).
I think it's a good start for that discussion, if I summarize the
current status of the system. TextMarker is an Eclipse-based tool
implemented in pure Java that can among other things be used to
prototype analysis engines or develop complex handcrafted text
processing applications. It consists of four major parts:
Language:
The rule or rather script language can be compared to regular
expressions over annotation with additional conditions and actions.
There are currently 28 different conditions and 34 actions. They range
from a test on a feature value to a test, if the matched annotation is
contained in another annotation of a given type, respectively from
creating an annotation to applying an external dictionary or analysis
engine. A TextMarker script can import type systems or define new types
or variables. Then, there are also some more complex control structures
for procedure calls, conditioned statements or recursion. The TextMarker
language (and inference) is in active usage in some productive
applications here, but it lacks of test cases. However, we are currently
writing uimaFIT based component test to improve the quality management.
Workbench:
The Eclipse-based tool for developing the TextMarker scripts is
currently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's
editor supports syntax highlighting, syntax checks, context-sensitive
auto-completion, formatting, mark occurrences, open declaration and some
other useful stuff commonly known in IDEs. For each script file, a type
system and an executable analysis engine is created. Therefore, it's
quite simple and efficient to create an analysis engine with a few lines
of TextMarker rules. The workbench supports testing on annotated xmiCas
while writing new rules and provides some minimal debugging
functionality that explains why and on what text a rule was executed.
CEV:
This plugin can be used to edit or visualize xmiCAS and is also able to
render HTML. It is heavily used by the testing and explanation components.
TextRuler:
This framework for rule learning is rather a playground and mainly
implemented by students. There are currently more or less working
implementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, and
three other algorithms are being implemented.
Overall, the system is working stable for a year now, but lacks in code
quality, documentation and test cases. Basically, we are also willing to
change the name of the system, if someone can think of a better one.
I'm looking forward to your comments.
Best regards,
Peter
--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg Tel.: +49-(0)931-31-86741
Am Hubland Fax.: +49-(0)931-31-86732
97074 Würzburg mail: pkluegl@informatik.uni-wuerzburg.de
http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------
Re: TextMarker
Posted by Nicolas Hernandez <ni...@gmail.com>.
Hi everyone,
just to let you know, since it is also a topic of interest at the LINA
Lab., we distribute an analysis engine to search the CAS (a FStructure
in an index) with XPath expressions. It is based on the Apache JXPath
library which allows to apply XPath expressions to graphs of objects.
Here, the graph of objects is any instance of a type system in the CAS
index. Thanks to the java.lang.reflect API, we made the annotator not
dedicated to a type system.
As a use case, it can be used for mapping annotations between two type systems.
More on http://code.google.com/p/uima-type-mapper
/N
# W3C XPath comme langage déclaratif d'expression des contraintes sur
la structure des types des annotations source
# Apache JXPath comme moteur qui implémente le traitement des contraintes
#
A use case can be type (annotation) mapping
http://code.google.com/p/uima-type-mapper/
On Wed, Dec 15, 2010 at 9:18 AM, Thilo Götz <tw...@gmx.de> wrote:
> On 14/12/10 19:57, Peter Klügl wrote:
> [...]
>> There is of course the LanguageWare platform, but as you know there are also
>> many differences.
>
> True, but LanguageWare is not open source, nor is there
> a language that is independent of the platform (last time
> I looked, anyway).
>
> --Thilo
>
--
nicolas.hernandez@univ-nantes.fr
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67
Re: TextMarker
Posted by Thilo Götz <tw...@gmx.de>.
On 14/12/10 19:57, Peter Klügl wrote:
[...]
> There is of course the LanguageWare platform, but as you know there are also
> many differences.
True, but LanguageWare is not open source, nor is there
a language that is independent of the platform (last time
I looked, anyway).
--Thilo
Re: TextMarker
Posted by Tommaso Teofili <to...@gmail.com>.
Hi Peter,
TextMarker seems very very interesting and it seems it would be a good
addition to UIMA framework.
Personally I'd need some more time to take a look at it deeply but it sounds
like a nice contribution.
Hope to be able to get back to this soon :)
Cheers,
Tommaso
2010/12/14 Peter Klügl <pk...@uni-wuerzburg.de>
> Hi Thilo,
>
> Am 14.12.2010 18:32, schrieb Thilo Götz:
>
> Hi Peter,
>>
>> I was very impressed when you showed me a demo of TextMarker
>> last year, so I think it's great you're coming up with this
>> proposal. I will download and play with it over the coming
>> few weeks, but I'll probably be really busy before Xmas, so
>> it might take a while...
>>
>> Thanks :-)
>
> Just a short note: As TextMarker is still based DLTK 1.0, the correct
> plugins are maybe missing in the Helios update site. I think I'll add
> another link in the wiki and improve the feature dependencies. Besides that,
> no problems should occur.
>
>
> If we decide to accept TextMarker into UIMA, we will need a
>> code grant: http://www.apache.org/licenses/software-grant.txt
>> I assume your university owns the rights to all the code, so
>> you may want to bring this up with your legal department. I
>> know it's a bit early, but I'm bringing this up now because
>> there may be some lead time.
>>
>>
> Yes, I will contact our legal department.
>
>
>
>> So just to make sure I understand this correctly: the
>> language is completely independent of the Eclipse based
>> development environment. I could in principle write
>> rules with just a text editor, if I wanted to. Correct?
>>
>>
> Yes. And I assume that some people are even doing that. But in this case
> you need to configure the analysis engine descriptor correctly since it is
> not created by the workbench.
>
>
> I think such a language is a very important feature
>> that UIMA is currently missing. We have nothing that
>> compares with GATE's JAPE language, for example.
>>
>> There is of course the LanguageWare platform, but as you know there are
> also many differences.
>
>
> Peter
>
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi Thilo,
Am 14.12.2010 18:32, schrieb Thilo Götz:
> Hi Peter,
>
> I was very impressed when you showed me a demo of TextMarker
> last year, so I think it's great you're coming up with this
> proposal. I will download and play with it over the coming
> few weeks, but I'll probably be really busy before Xmas, so
> it might take a while...
>
Thanks :-)
Just a short note: As TextMarker is still based DLTK 1.0, the correct
plugins are maybe missing in the Helios update site. I think I'll add
another link in the wiki and improve the feature dependencies. Besides
that, no problems should occur.
> If we decide to accept TextMarker into UIMA, we will need a
> code grant: http://www.apache.org/licenses/software-grant.txt
> I assume your university owns the rights to all the code, so
> you may want to bring this up with your legal department. I
> know it's a bit early, but I'm bringing this up now because
> there may be some lead time.
>
Yes, I will contact our legal department.
>
> So just to make sure I understand this correctly: the
> language is completely independent of the Eclipse based
> development environment. I could in principle write
> rules with just a text editor, if I wanted to. Correct?
>
Yes. And I assume that some people are even doing that. But in this case
you need to configure the analysis engine descriptor correctly since it
is not created by the workbench.
> I think such a language is a very important feature
> that UIMA is currently missing. We have nothing that
> compares with GATE's JAPE language, for example.
>
There is of course the LanguageWare platform, but as you know there are
also many differences.
Peter
Re: TextMarker
Posted by Thilo Götz <tw...@gmx.de>.
Hi Peter,
I was very impressed when you showed me a demo of TextMarker
last year, so I think it's great you're coming up with this
proposal. I will download and play with it over the coming
few weeks, but I'll probably be really busy before Xmas, so
it might take a while...
If we decide to accept TextMarker into UIMA, we will need a
code grant: http://www.apache.org/licenses/software-grant.txt
I assume your university owns the rights to all the code, so
you may want to bring this up with your legal department. I
know it's a bit early, but I'm bringing this up now because
there may be some lead time.
More questions and comments inline.
On 12/14/2010 15:55, Peter Klügl wrote:
> Hello,
>
> We would like to contribute our TextMarker system to Apache UIMA and want to
> ask, if the development team is interested in this contribution. The system is
> currently hosted on SourceForge (http://sourceforge.net/projects/textmarker/)
> and there is some documentation in the project wiki
> (http://tmwiki.informatik.uni-wuerzburg.de/).
>
> I think it's a good start for that discussion, if I summarize the current status
> of the system. TextMarker is an Eclipse-based tool implemented in pure Java that
> can among other things be used to prototype analysis engines or develop complex
> handcrafted text processing applications. It consists of four major parts:
>
> Language:
> The rule or rather script language can be compared to regular expressions over
> annotation with additional conditions and actions. There are currently 28
> different conditions and 34 actions. They range from a test on a feature value
> to a test, if the matched annotation is contained in another annotation of a
> given type, respectively from creating an annotation to applying an external
> dictionary or analysis engine. A TextMarker script can import type systems or
> define new types or variables. Then, there are also some more complex control
> structures for procedure calls, conditioned statements or recursion. The
> TextMarker language (and inference) is in active usage in some productive
> applications here, but it lacks of test cases. However, we are currently writing
> uimaFIT based component test to improve the quality management.
So just to make sure I understand this correctly: the
language is completely independent of the Eclipse based
development environment. I could in principle write
rules with just a text editor, if I wanted to. Correct?
I think such a language is a very important feature
that UIMA is currently missing. We have nothing that
compares with GATE's JAPE language, for example.
>
> Workbench:
> The Eclipse-based tool for developing the TextMarker scripts is currently based
> on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's editor supports syntax
> highlighting, syntax checks, context-sensitive auto-completion, formatting, mark
> occurrences, open declaration and some other useful stuff commonly known in
> IDEs. For each script file, a type system and an executable analysis engine is
> created. Therefore, it's quite simple and efficient to create an analysis engine
> with a few lines of TextMarker rules. The workbench supports testing on
> annotated xmiCas while writing new rules and provides some minimal debugging
> functionality that explains why and on what text a rule was executed.
Cool, I've been looking into DLTK myself recently. Great stuff.
>
> CEV:
> This plugin can be used to edit or visualize xmiCAS and is also able to render
> HTML. It is heavily used by the testing and explanation components.
So here we'd have to figure out if it would make
sense to unify it with our CAS Editor.
>
> TextRuler:
> This framework for rule learning is rather a playground and mainly implemented
> by students. There are currently more or less working implementations of LP2,
> WHISK, WIEN, RAPIER and an own algorithm, and three other algorithms are being
> implemented.
Sounds interesting.
>
>
> Overall, the system is working stable for a year now, but lacks in code quality,
> documentation and test cases. Basically, we are also willing to change the name
> of the system, if someone can think of a better one.
>
> I'm looking forward to your comments.
>
> Best regards,
>
> Peter
>
>
--Thilo
Re: TextMarker
Posted by Marshall Schor <ms...@schor.com>.
+1 to accept TextMarker into the sandbox :-) -Marshall
On 7/28/2011 10:21 AM, Jörn Kottmann wrote:
> Can we assume lazy consensus, or do we want to vote to accept
> Text Marker as a contribution?
>
> Jörn
>
> On 7/19/11 6:08 PM, Jörn Kottmann wrote:
>> On 7/19/11 4:55 PM, Peter Klügl wrote:
>>> All three, that is the ICLA, the CCLA and the software grant.
>>
>> As far as I know only members can verify, if these papers have been
>> registered, right?
>>
>> Depending on how long it takes to process them, it would be nice if
>> someone could have a look.
>>
>> Jörn
>
>
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
Peter can you please start a vote here?
Thanks,
Jörn
On 7/28/11 8:50 PM, Eddie Epstein wrote:
> +1 to having a vote. (And then +1 to accepting TextMarker into the sandbox.)
>
> Eddie
>
> On Thu, Jul 28, 2011 at 10:21 AM, Jörn Kottmann<ko...@gmail.com> wrote:
>> Can we assume lazy consensus, or do we want to vote to accept
>> Text Marker as a contribution?
>>
>> Jörn
>>
>> On 7/19/11 6:08 PM, Jörn Kottmann wrote:
>>> On 7/19/11 4:55 PM, Peter Klügl wrote:
>>>> All three, that is the ICLA, the CCLA and the software grant.
>>> As far as I know only members can verify, if these papers have been
>>> registered, right?
>>>
>>> Depending on how long it takes to process them, it would be nice if
>>> someone could have a look.
>>>
>>> Jörn
>>
Re: TextMarker
Posted by Eddie Epstein <ea...@gmail.com>.
+1 to having a vote. (And then +1 to accepting TextMarker into the sandbox.)
Eddie
On Thu, Jul 28, 2011 at 10:21 AM, Jörn Kottmann <ko...@gmail.com> wrote:
> Can we assume lazy consensus, or do we want to vote to accept
> Text Marker as a contribution?
>
> Jörn
>
> On 7/19/11 6:08 PM, Jörn Kottmann wrote:
>>
>> On 7/19/11 4:55 PM, Peter Klügl wrote:
>>>
>>> All three, that is the ICLA, the CCLA and the software grant.
>>
>> As far as I know only members can verify, if these papers have been
>> registered, right?
>>
>> Depending on how long it takes to process them, it would be nice if
>> someone could have a look.
>>
>> Jörn
>
>
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
Can we assume lazy consensus, or do we want to vote to accept
Text Marker as a contribution?
Jörn
On 7/19/11 6:08 PM, Jörn Kottmann wrote:
> On 7/19/11 4:55 PM, Peter Klügl wrote:
>> All three, that is the ICLA, the CCLA and the software grant.
>
> As far as I know only members can verify, if these papers have been
> registered, right?
>
> Depending on how long it takes to process them, it would be nice if
> someone could have a look.
>
> Jörn
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/19/11 4:55 PM, Peter Klügl wrote:
> All three, that is the ICLA, the CCLA and the software grant.
As far as I know only members can verify, if these papers have been
registered, right?
Depending on how long it takes to process them, it would be nice if
someone could have a look.
Jörn
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
All three, that is the ICLA, the CCLA and the software grant.
I just wanted to mention that the code quality isn't as good as for
example the CAS Editor, but I'll change that.
Am 19.07.2011 16:45, schrieb Jörn Kottmann:
> On 7/19/11 1:04 PM, Peter Klügl wrote:
>>
>> all documents are signed and sent.
>
> Which documents did you sign?
>
> Jörn
--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg Tel.: +49-(0)931-31-86741
Am Hubland Fax.: +49-(0)931-31-86732
97074 Würzburg mail: pkluegl@informatik.uni-wuerzburg.de
http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/19/11 1:04 PM, Peter Klügl wrote:
>
> all documents are signed and sent.
Which documents did you sign?
Jörn
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/19/11 2:56 PM, Peter Klügl wrote:
> I'am just creating a "new feature" issue. Is that correct?
Yes, that is fine.
Jörn
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
I'am just creating a "new feature" issue. Is that correct?
Am 19.07.2011 14:50, schrieb Jörn Kottmann:
> On 7/19/11 2:42 PM, Peter Klügl wrote:
>> Ok, so I just zip everything and attach it as it is, e.g., no maven
>> build process and with the current namespace de.uniwue... ?
> Usually you do that before the import, anyway for the one who imports
> it, its done after a few clicks in the IDE.
>
> Where do you attached it?
>
> Jörn
--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg Tel.: +49-(0)931-31-86741
Am Hubland Fax.: +49-(0)931-31-86732
97074 Würzburg mail: pkluegl@informatik.uni-wuerzburg.de
http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/19/11 2:42 PM, Peter Klügl wrote:
> Ok, so I just zip everything and attach it as it is, e.g., no maven
> build process and with the current namespace de.uniwue... ?
Usually you do that before the import, anyway for the one who imports
it, its done after a few clicks in the IDE.
Where do you attached it?
Jörn
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Ok, so I just zip everything and attach it as it is, e.g., no maven
build process and with the current namespace de.uniwue... ?
Am 19.07.2011 13:29, schrieb Jörn Kottmann:
> On 7/19/11 1:04 PM, Peter Klügl wrote:
>> Maybe it would be prudent step to change the name of the system when
>> committing since I have to rename all packages and stuff anyway.
>>
>> What do you think?
>
> Lets keep the name for now, we can rename it after it is in subversion.
> It is also mentioned in the papers, a rename before we receive the
> contribution
> could be a legal issue.
>
> Next step is to create a jira for your contribution, and then attach
> it to it.
>
> Jörn
--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg Tel.: +49-(0)931-31-86741
Am Hubland Fax.: +49-(0)931-31-86732
97074 Würzburg mail: pkluegl@informatik.uni-wuerzburg.de
http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/19/11 1:04 PM, Peter Klügl wrote:
> Maybe it would be prudent step to change the name of the system when
> committing since I have to rename all packages and stuff anyway.
>
> What do you think?
Lets keep the name for now, we can rename it after it is in subversion.
It is also mentioned in the papers, a rename before we receive the
contribution
could be a legal issue.
Next step is to create a jira for your contribution, and then attach it
to it.
Jörn
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,
all documents are signed and sent.
The return mail says:
"If you have been voted in as a committer, please advise the project PMC
that your ICLA has been filed."
Another thing:
"textmarker" is an eponym for a (highlighter) brand in germany and
textmarker is therefore of course a protected term for the product. I
felt quite safe since the TextMarker system is just a small academic
project and (as far as I know) it's not so problematic to use a term in
a different domain in germany. But I have no clue if there might be any
problem in future or because of some other legislation.
Besides that, there are also two other projects I know of called
textmarker, e.g., a firefox plugin.
Maybe it would be prudent step to change the name of the system when
committing since I have to rename all packages and stuff anyway.
What do you think?
Peter
Am 15.07.2011 16:37, schrieb Jörn Kottmann:
> On 7/15/11 4:31 PM, Peter Klügl wrote:
>> Hi,
>>
>> yes, you are both right and the sandbox is a good place for it.
>>
>> I'll take a look closer look at all the documents and information on
>> the webpage/wiki. Is there anything that I shouldn't miss?
>>
>> What would be the next step?
>
>
> You need to sign an ICLA and send it to the ASF:
> http://www.apache.org/licenses/icla.txt
>
> Beside that a CCLA would be good to have:
> http://www.apache.org/licenses/cla-corporate.txt
>
> As far as I know we also need a Software Grant Agreement from your
> employer, because you contribution is quite big, otherwise the
> CLAs might have been enough.
>
> Here is the link:
> http://www.apache.org/licenses/software-grant.txt
>
> Hope that helps,
> Jörn
--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg Tel.: +49-(0)931-31-86741
Am Hubland Fax.: +49-(0)931-31-86732
97074 Würzburg mail: pkluegl@informatik.uni-wuerzburg.de
http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/15/11 4:31 PM, Peter Klügl wrote:
> Hi,
>
> yes, you are both right and the sandbox is a good place for it.
>
> I'll take a look closer look at all the documents and information on
> the webpage/wiki. Is there anything that I shouldn't miss?
>
> What would be the next step?
You need to sign an ICLA and send it to the ASF:
http://www.apache.org/licenses/icla.txt
Beside that a CCLA would be good to have:
http://www.apache.org/licenses/cla-corporate.txt
As far as I know we also need a Software Grant Agreement from your
employer, because you contribution is quite big, otherwise the
CLAs might have been enough.
Here is the link:
http://www.apache.org/licenses/software-grant.txt
Hope that helps,
Jörn
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,
yes, you are both right and the sandbox is a good place for it.
I'll take a look closer look at all the documents and information on the
webpage/wiki. Is there anything that I shouldn't miss?
What would be the next step?
Peter
Am 15.07.2011 13:38, schrieb Thilo Götz:
> On 14/07/11 18:48, Jörn Kottmann wrote:
>> On 7/14/11 4:20 PM, Peter Klügl wrote:
>>> Hi all,
>>>
>>> I got right now the OK of our legal department, after exactly 7 months...
>>> don't ask ;-)
>>>
>>> We would of course still like to contribute the system to Apache UIMA, if the
>>> development team is interested.
>>>
>>> It's maybe prudent to incrementally contribute the different parts of the
>>> TextMarker system, e.g., starting with the rule engine after I reimplemented
>>> it in August. Afterwards, if there is interest in the Eclipse IDE, then I'd
>>> switch to DLTK 2.0 or 3.0 before contributing it.
>>>
>>> I'm looking forward to your thoughts and opinions.
>> Why not contribute everything at once?
>>
>> We have a sandbox where it can stay until it (or pieces of it) are in a
>> release-able state.
>>
>> I believe it is actually a disadvantage when you decide to develop the eclipse
>> plugin outside
>> of Apache for a while, just to bring it into a state which you feel should be
>> contributed.
>> Because this way the knowledge about it also stays outside.
>>
>> Jörn
>>
>>
> A code contribution of that size is also a bit of work. You don't
> want to go through it several times.
>
> --Thilo
--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg Tel.: +49-(0)931-31-86741
Am Hubland Fax.: +49-(0)931-31-86732
97074 Würzburg mail: pkluegl@informatik.uni-wuerzburg.de
http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------
Re: TextMarker
Posted by Thilo Götz <tw...@gmx.de>.
On 14/07/11 18:48, Jörn Kottmann wrote:
> On 7/14/11 4:20 PM, Peter Klügl wrote:
>> Hi all,
>>
>> I got right now the OK of our legal department, after exactly 7 months...
>> don't ask ;-)
>>
>> We would of course still like to contribute the system to Apache UIMA, if the
>> development team is interested.
>>
>> It's maybe prudent to incrementally contribute the different parts of the
>> TextMarker system, e.g., starting with the rule engine after I reimplemented
>> it in August. Afterwards, if there is interest in the Eclipse IDE, then I'd
>> switch to DLTK 2.0 or 3.0 before contributing it.
>>
>> I'm looking forward to your thoughts and opinions.
>
> Why not contribute everything at once?
>
> We have a sandbox where it can stay until it (or pieces of it) are in a
> release-able state.
>
> I believe it is actually a disadvantage when you decide to develop the eclipse
> plugin outside
> of Apache for a while, just to bring it into a state which you feel should be
> contributed.
> Because this way the knowledge about it also stays outside.
>
> Jörn
>
>
A code contribution of that size is also a bit of work. You don't
want to go through it several times.
--Thilo
Re: TextMarker
Posted by Jörn Kottmann <ko...@gmail.com>.
On 7/14/11 4:20 PM, Peter Klügl wrote:
> Hi all,
>
> I got right now the OK of our legal department, after exactly 7
> months... don't ask ;-)
>
> We would of course still like to contribute the system to Apache UIMA,
> if the development team is interested.
>
> It's maybe prudent to incrementally contribute the different parts of
> the TextMarker system, e.g., starting with the rule engine after I
> reimplemented it in August. Afterwards, if there is interest in the
> Eclipse IDE, then I'd switch to DLTK 2.0 or 3.0 before contributing it.
>
> I'm looking forward to your thoughts and opinions.
Why not contribute everything at once?
We have a sandbox where it can stay until it (or pieces of it) are in a
release-able state.
I believe it is actually a disadvantage when you decide to develop the
eclipse plugin outside
of Apache for a while, just to bring it into a state which you feel
should be contributed.
Because this way the knowledge about it also stays outside.
Jörn
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi all,
I got right now the OK of our legal department, after exactly 7
months... don't ask ;-)
We would of course still like to contribute the system to Apache UIMA,
if the development team is interested.
It's maybe prudent to incrementally contribute the different parts of
the TextMarker system, e.g., starting with the rule engine after I
reimplemented it in August. Afterwards, if there is interest in the
Eclipse IDE, then I'd switch to DLTK 2.0 or 3.0 before contributing it.
I'm looking forward to your thoughts and opinions.
Best regards,
Peter
Am 03.01.2011 16:00, schrieb Peter Klügl:
> Hi Thilo,
>
> Am 01.01.2011 13:41, schrieb Thilo Goetz:
>> Hi Peter,
>>
>> I downloaded the source trunk and got things mostly to compile
>> and run: I'm running Eclipse 3.5.2, RCP edition, and installed
>> the latest UIMA plugins and DLTK 1.0.2. I also had to find the
>> Mozilla xpcom plugin. The only thing not compiling for me are
>> references to com.sun.org.apache.apache.xpath.XPathAPI. The
>> internet tells me that those could be fixed by using Xalan
>> directly, but I haven't tried.
>>
>
> The XPCom plugin is only necessary for the HTML visualization of the
> CEV plugin. The XULRunner plugin provides the implementations of the
> interfaces for the manipulation of the DOM within Eclipse. Both
> plugins often cause problems, but I haven't found a better solution yet.
>
> About the XML problem: Which plugin has that reference? I've had a
> similar problem about three year ago, but that should be solved.
> However, I'm not an expert of the different XML integrations in Java.
> The only place in my code, if I'm not mistaken, where XML is actively
> used, is the engine project that is able to load dictionaries in
> trie-like structures. But that should work just fine without
> additional libraries. Can you give me more information about that
> problem?
>
>> My main issue right now is that the TextMarker wiki is down,
>> and that seems to be the only source of documentation (unless
>> I missed something).
>
> I'm sorry about that. My colleagues moved the wiki to a new server
> that is not as stable as expected. We will fix that ASAP. The wiki is
> still the only bit of documentation that currently exists.
>
>>
>> I noticed that TextMarker uses a lot of 3rd party libraries.
>> So we'll need to compile an exhaustive list of the the libs
>> that are being used, their licenses and provenance, and in
>> case the license is bad, possible alternatives.
>>
>
> I'm willing to reduce the usage or exchange any 3rd party library if
> possible.
>
> The most important dependencies are the UIMA-runtime plugin, the
> Eclipse-plugins (core, ui...), the plugins of the DLTK-Core framework
> and ANTLR (used for the AST in the IDE and for interpreting the rules
> in the analysis engines). The optional HTML extension of the CEV
> plugin uses an html-parser additional to the XPCom dependency.
>
> There are only historical reasons why some plugins were hosted on
> SourceForge and they are not part of the TextMarker system. I have
> removed them now:
> de.uniwue.tm.cas.converter
> de.uniwue.tm.old.OfficeConverter
> de.uniwue.tm.textmarker.uutuc
>
>
> Peter
>
>
>> --Thilo
>>
>> On 12/14/2010 15:55, Peter Klügl wrote:
>>> Hello,
>>>
>>> We would like to contribute our TextMarker system to Apache UIMA and
>>> want to ask, if the development team is interested in this
>>> contribution.
>>> The system is currently hosted on SourceForge
>>> (http://sourceforge.net/projects/textmarker/) and there is some
>>> documentation in the project wiki
>>> (http://tmwiki.informatik.uni-wuerzburg.de/).
>>>
>>> I think it's a good start for that discussion, if I summarize the
>>> current status of the system. TextMarker is an Eclipse-based tool
>>> implemented in pure Java that can among other things be used to
>>> prototype analysis engines or develop complex handcrafted text
>>> processing applications. It consists of four major parts:
>>>
>>> Language:
>>> The rule or rather script language can be compared to regular
>>> expressions over annotation with additional conditions and actions.
>>> There are currently 28 different conditions and 34 actions. They range
>>> from a test on a feature value to a test, if the matched annotation is
>>> contained in another annotation of a given type, respectively from
>>> creating an annotation to applying an external dictionary or analysis
>>> engine. A TextMarker script can import type systems or define new types
>>> or variables. Then, there are also some more complex control structures
>>> for procedure calls, conditioned statements or recursion. The
>>> TextMarker
>>> language (and inference) is in active usage in some productive
>>> applications here, but it lacks of test cases. However, we are
>>> currently
>>> writing uimaFIT based component test to improve the quality management.
>>>
>>> Workbench:
>>> The Eclipse-based tool for developing the TextMarker scripts is
>>> currently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's
>>> editor supports syntax highlighting, syntax checks, context-sensitive
>>> auto-completion, formatting, mark occurrences, open declaration and
>>> some
>>> other useful stuff commonly known in IDEs. For each script file, a type
>>> system and an executable analysis engine is created. Therefore, it's
>>> quite simple and efficient to create an analysis engine with a few
>>> lines
>>> of TextMarker rules. The workbench supports testing on annotated xmiCas
>>> while writing new rules and provides some minimal debugging
>>> functionality that explains why and on what text a rule was executed.
>>>
>>> CEV:
>>> This plugin can be used to edit or visualize xmiCAS and is also able to
>>> render HTML. It is heavily used by the testing and explanation
>>> components.
>>>
>>> TextRuler:
>>> This framework for rule learning is rather a playground and mainly
>>> implemented by students. There are currently more or less working
>>> implementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, and
>>> three other algorithms are being implemented.
>>>
>>>
>>> Overall, the system is working stable for a year now, but lacks in code
>>> quality, documentation and test cases. Basically, we are also
>>> willing to
>>> change the name of the system, if someone can think of a better one.
>>>
>>> I'm looking forward to your comments.
>>>
>>> Best regards,
>>>
>>> Peter
>>>
>>>
--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg Tel.: +49-(0)931-31-86741
Am Hubland Fax.: +49-(0)931-31-86732
97074 Würzburg mail: pkluegl@informatik.uni-wuerzburg.de
http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
The wiki is running again...
Peter
Am 03.01.2011 16:00, schrieb Peter Klügl:
> Hi Thilo,
>
> Am 01.01.2011 13:41, schrieb Thilo Goetz:
>> Hi Peter,
>>
>> I downloaded the source trunk and got things mostly to compile
>> and run: I'm running Eclipse 3.5.2, RCP edition, and installed
>> the latest UIMA plugins and DLTK 1.0.2. I also had to find the
>> Mozilla xpcom plugin. The only thing not compiling for me are
>> references to com.sun.org.apache.apache.xpath.XPathAPI. The
>> internet tells me that those could be fixed by using Xalan
>> directly, but I haven't tried.
>>
>
> The XPCom plugin is only necessary for the HTML visualization of the
> CEV plugin. The XULRunner plugin provides the implementations of the
> interfaces for the manipulation of the DOM within Eclipse. Both
> plugins often cause problems, but I haven't found a better solution yet.
>
> About the XML problem: Which plugin has that reference? I've had a
> similar problem about three year ago, but that should be solved.
> However, I'm not an expert of the different XML integrations in Java.
> The only place in my code, if I'm not mistaken, where XML is actively
> used, is the engine project that is able to load dictionaries in
> trie-like structures. But that should work just fine without
> additional libraries. Can you give me more information about that
> problem?
>
>> My main issue right now is that the TextMarker wiki is down,
>> and that seems to be the only source of documentation (unless
>> I missed something).
>
> I'm sorry about that. My colleagues moved the wiki to a new server
> that is not as stable as expected. We will fix that ASAP. The wiki is
> still the only bit of documentation that currently exists.
>
>>
>> I noticed that TextMarker uses a lot of 3rd party libraries.
>> So we'll need to compile an exhaustive list of the the libs
>> that are being used, their licenses and provenance, and in
>> case the license is bad, possible alternatives.
>>
>
> I'm willing to reduce the usage or exchange any 3rd party library if
> possible.
>
> The most important dependencies are the UIMA-runtime plugin, the
> Eclipse-plugins (core, ui...), the plugins of the DLTK-Core framework
> and ANTLR (used for the AST in the IDE and for interpreting the rules
> in the analysis engines). The optional HTML extension of the CEV
> plugin uses an html-parser additional to the XPCom dependency.
>
> There are only historical reasons why some plugins were hosted on
> SourceForge and they are not part of the TextMarker system. I have
> removed them now:
> de.uniwue.tm.cas.converter
> de.uniwue.tm.old.OfficeConverter
> de.uniwue.tm.textmarker.uutuc
>
>
> Peter
>
>
>> --Thilo
>>
>> On 12/14/2010 15:55, Peter Klügl wrote:
>>> Hello,
>>>
>>> We would like to contribute our TextMarker system to Apache UIMA and
>>> want to ask, if the development team is interested in this
>>> contribution.
>>> The system is currently hosted on SourceForge
>>> (http://sourceforge.net/projects/textmarker/) and there is some
>>> documentation in the project wiki
>>> (http://tmwiki.informatik.uni-wuerzburg.de/).
>>>
>>> I think it's a good start for that discussion, if I summarize the
>>> current status of the system. TextMarker is an Eclipse-based tool
>>> implemented in pure Java that can among other things be used to
>>> prototype analysis engines or develop complex handcrafted text
>>> processing applications. It consists of four major parts:
>>>
>>> Language:
>>> The rule or rather script language can be compared to regular
>>> expressions over annotation with additional conditions and actions.
>>> There are currently 28 different conditions and 34 actions. They range
>>> from a test on a feature value to a test, if the matched annotation is
>>> contained in another annotation of a given type, respectively from
>>> creating an annotation to applying an external dictionary or analysis
>>> engine. A TextMarker script can import type systems or define new types
>>> or variables. Then, there are also some more complex control structures
>>> for procedure calls, conditioned statements or recursion. The
>>> TextMarker
>>> language (and inference) is in active usage in some productive
>>> applications here, but it lacks of test cases. However, we are
>>> currently
>>> writing uimaFIT based component test to improve the quality management.
>>>
>>> Workbench:
>>> The Eclipse-based tool for developing the TextMarker scripts is
>>> currently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's
>>> editor supports syntax highlighting, syntax checks, context-sensitive
>>> auto-completion, formatting, mark occurrences, open declaration and
>>> some
>>> other useful stuff commonly known in IDEs. For each script file, a type
>>> system and an executable analysis engine is created. Therefore, it's
>>> quite simple and efficient to create an analysis engine with a few
>>> lines
>>> of TextMarker rules. The workbench supports testing on annotated xmiCas
>>> while writing new rules and provides some minimal debugging
>>> functionality that explains why and on what text a rule was executed.
>>>
>>> CEV:
>>> This plugin can be used to edit or visualize xmiCAS and is also able to
>>> render HTML. It is heavily used by the testing and explanation
>>> components.
>>>
>>> TextRuler:
>>> This framework for rule learning is rather a playground and mainly
>>> implemented by students. There are currently more or less working
>>> implementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, and
>>> three other algorithms are being implemented.
>>>
>>>
>>> Overall, the system is working stable for a year now, but lacks in code
>>> quality, documentation and test cases. Basically, we are also
>>> willing to
>>> change the name of the system, if someone can think of a better one.
>>>
>>> I'm looking forward to your comments.
>>>
>>> Best regards,
>>>
>>> Peter
>>>
>>>
Re: TextMarker
Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi Thilo,
Am 01.01.2011 13:41, schrieb Thilo Goetz:
> Hi Peter,
>
> I downloaded the source trunk and got things mostly to compile
> and run: I'm running Eclipse 3.5.2, RCP edition, and installed
> the latest UIMA plugins and DLTK 1.0.2. I also had to find the
> Mozilla xpcom plugin. The only thing not compiling for me are
> references to com.sun.org.apache.apache.xpath.XPathAPI. The
> internet tells me that those could be fixed by using Xalan
> directly, but I haven't tried.
>
The XPCom plugin is only necessary for the HTML visualization of the CEV
plugin. The XULRunner plugin provides the implementations of the
interfaces for the manipulation of the DOM within Eclipse. Both plugins
often cause problems, but I haven't found a better solution yet.
About the XML problem: Which plugin has that reference? I've had a
similar problem about three year ago, but that should be solved.
However, I'm not an expert of the different XML integrations in Java.
The only place in my code, if I'm not mistaken, where XML is actively
used, is the engine project that is able to load dictionaries in
trie-like structures. But that should work just fine without additional
libraries. Can you give me more information about that problem?
> My main issue right now is that the TextMarker wiki is down,
> and that seems to be the only source of documentation (unless
> I missed something).
I'm sorry about that. My colleagues moved the wiki to a new server that
is not as stable as expected. We will fix that ASAP. The wiki is still
the only bit of documentation that currently exists.
>
> I noticed that TextMarker uses a lot of 3rd party libraries.
> So we'll need to compile an exhaustive list of the the libs
> that are being used, their licenses and provenance, and in
> case the license is bad, possible alternatives.
>
I'm willing to reduce the usage or exchange any 3rd party library if
possible.
The most important dependencies are the UIMA-runtime plugin, the
Eclipse-plugins (core, ui...), the plugins of the DLTK-Core framework
and ANTLR (used for the AST in the IDE and for interpreting the rules in
the analysis engines). The optional HTML extension of the CEV plugin
uses an html-parser additional to the XPCom dependency.
There are only historical reasons why some plugins were hosted on
SourceForge and they are not part of the TextMarker system. I have
removed them now:
de.uniwue.tm.cas.converter
de.uniwue.tm.old.OfficeConverter
de.uniwue.tm.textmarker.uutuc
Peter
> --Thilo
>
> On 12/14/2010 15:55, Peter Klügl wrote:
>> Hello,
>>
>> We would like to contribute our TextMarker system to Apache UIMA and
>> want to ask, if the development team is interested in this contribution.
>> The system is currently hosted on SourceForge
>> (http://sourceforge.net/projects/textmarker/) and there is some
>> documentation in the project wiki
>> (http://tmwiki.informatik.uni-wuerzburg.de/).
>>
>> I think it's a good start for that discussion, if I summarize the
>> current status of the system. TextMarker is an Eclipse-based tool
>> implemented in pure Java that can among other things be used to
>> prototype analysis engines or develop complex handcrafted text
>> processing applications. It consists of four major parts:
>>
>> Language:
>> The rule or rather script language can be compared to regular
>> expressions over annotation with additional conditions and actions.
>> There are currently 28 different conditions and 34 actions. They range
>> from a test on a feature value to a test, if the matched annotation is
>> contained in another annotation of a given type, respectively from
>> creating an annotation to applying an external dictionary or analysis
>> engine. A TextMarker script can import type systems or define new types
>> or variables. Then, there are also some more complex control structures
>> for procedure calls, conditioned statements or recursion. The TextMarker
>> language (and inference) is in active usage in some productive
>> applications here, but it lacks of test cases. However, we are currently
>> writing uimaFIT based component test to improve the quality management.
>>
>> Workbench:
>> The Eclipse-based tool for developing the TextMarker scripts is
>> currently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's
>> editor supports syntax highlighting, syntax checks, context-sensitive
>> auto-completion, formatting, mark occurrences, open declaration and some
>> other useful stuff commonly known in IDEs. For each script file, a type
>> system and an executable analysis engine is created. Therefore, it's
>> quite simple and efficient to create an analysis engine with a few lines
>> of TextMarker rules. The workbench supports testing on annotated xmiCas
>> while writing new rules and provides some minimal debugging
>> functionality that explains why and on what text a rule was executed.
>>
>> CEV:
>> This plugin can be used to edit or visualize xmiCAS and is also able to
>> render HTML. It is heavily used by the testing and explanation
>> components.
>>
>> TextRuler:
>> This framework for rule learning is rather a playground and mainly
>> implemented by students. There are currently more or less working
>> implementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, and
>> three other algorithms are being implemented.
>>
>>
>> Overall, the system is working stable for a year now, but lacks in code
>> quality, documentation and test cases. Basically, we are also willing to
>> change the name of the system, if someone can think of a better one.
>>
>> I'm looking forward to your comments.
>>
>> Best regards,
>>
>> Peter
>>
>>
Re: TextMarker
Posted by Thilo Goetz <tw...@gmx.de>.
Hi Peter,
I downloaded the source trunk and got things mostly to compile
and run: I'm running Eclipse 3.5.2, RCP edition, and installed
the latest UIMA plugins and DLTK 1.0.2. I also had to find the
Mozilla xpcom plugin. The only thing not compiling for me are
references to com.sun.org.apache.apache.xpath.XPathAPI. The
internet tells me that those could be fixed by using Xalan
directly, but I haven't tried.
My main issue right now is that the TextMarker wiki is down,
and that seems to be the only source of documentation (unless
I missed something).
I noticed that TextMarker uses a lot of 3rd party libraries.
So we'll need to compile an exhaustive list of the the libs
that are being used, their licenses and provenance, and in
case the license is bad, possible alternatives.
--Thilo
On 12/14/2010 15:55, Peter Klügl wrote:
> Hello,
>
> We would like to contribute our TextMarker system to Apache UIMA and
> want to ask, if the development team is interested in this contribution.
> The system is currently hosted on SourceForge
> (http://sourceforge.net/projects/textmarker/) and there is some
> documentation in the project wiki
> (http://tmwiki.informatik.uni-wuerzburg.de/).
>
> I think it's a good start for that discussion, if I summarize the
> current status of the system. TextMarker is an Eclipse-based tool
> implemented in pure Java that can among other things be used to
> prototype analysis engines or develop complex handcrafted text
> processing applications. It consists of four major parts:
>
> Language:
> The rule or rather script language can be compared to regular
> expressions over annotation with additional conditions and actions.
> There are currently 28 different conditions and 34 actions. They range
> from a test on a feature value to a test, if the matched annotation is
> contained in another annotation of a given type, respectively from
> creating an annotation to applying an external dictionary or analysis
> engine. A TextMarker script can import type systems or define new types
> or variables. Then, there are also some more complex control structures
> for procedure calls, conditioned statements or recursion. The TextMarker
> language (and inference) is in active usage in some productive
> applications here, but it lacks of test cases. However, we are currently
> writing uimaFIT based component test to improve the quality management.
>
> Workbench:
> The Eclipse-based tool for developing the TextMarker scripts is
> currently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's
> editor supports syntax highlighting, syntax checks, context-sensitive
> auto-completion, formatting, mark occurrences, open declaration and some
> other useful stuff commonly known in IDEs. For each script file, a type
> system and an executable analysis engine is created. Therefore, it's
> quite simple and efficient to create an analysis engine with a few lines
> of TextMarker rules. The workbench supports testing on annotated xmiCas
> while writing new rules and provides some minimal debugging
> functionality that explains why and on what text a rule was executed.
>
> CEV:
> This plugin can be used to edit or visualize xmiCAS and is also able to
> render HTML. It is heavily used by the testing and explanation components.
>
> TextRuler:
> This framework for rule learning is rather a playground and mainly
> implemented by students. There are currently more or less working
> implementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, and
> three other algorithms are being implemented.
>
>
> Overall, the system is working stable for a year now, but lacks in code
> quality, documentation and test cases. Basically, we are also willing to
> change the name of the system, if someone can think of a better one.
>
> I'm looking forward to your comments.
>
> Best regards,
>
> Peter
>
>