You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "B. Li" <cs...@126.com> on 2019/07/05 01:51:28 UTC

a Problem about WORDTABLE

Hi All,


I am trying to use a WordTable to configure and give several different attribute values (with different columns) to some SINGLE (Chinese) characters, but I always fail to get the correct values from columns in the WordTable file, although the engine can correctly recognize and mark the SINGLE characters. I am using RUTA 2.4.0. How can I solve this problem? Any hint would be greatly appreciated!


Thanks a lot,


Baoli LI

Re: how to match patterns back from the end of an input string

Posted by "B. Li" <cs...@126.com>.
Thanks Peter.


That works for most cases, but, if we need to call sub-scripts recursively (using something like {->CALL(ASPECIALSCRIPT)}), it seems difficult to detect the ending of document in the script ASPECIALSCRIPT.


Kind regards,


Baoli
On 7/26/2019 18:27,Peter Klügl<pe...@averbis.com> wrote:
Hi,


there are no special language elements for this. Howver, there are many
other ways to do this (efficiently).

You could for example create an annotation on the last part of the
document with MARKLAST and then use that nnoation as a starting anchor
"@" in an additional rule.


Best,


Peter


Am 26.07.2019 um 11:54 schrieb B. Li:
Hi All,


I would like to match patterns back from the end of an input string, which may not end at SENTENCEEND. I am wondering whether there are some special tokens like "^" and "$" in normal regular expression in RUTA.


Thanks in advance,


Baoli

--
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: how to match patterns back from the end of an input string

Posted by Peter Klügl <pe...@averbis.com>.
Hi,


there are no special language elements for this. Howver, there are many
other ways to do this (efficiently).

You could for example create an annotation on the last part of the
document with MARKLAST and then use that nnoation as a starting anchor
"@" in an additional rule.


Best,


Peter


Am 26.07.2019 um 11:54 schrieb B. Li:
> Hi All,
>
>
> I would like to match patterns back from the end of an input string, which may not end at SENTENCEEND. I am wondering whether there are some special tokens like "^" and "$" in normal regular expression in RUTA.
>
>
> Thanks in advance,
>
>
> Baoli

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


how to match patterns back from the end of an input string

Posted by "B. Li" <cs...@126.com>.
Hi All,


I would like to match patterns back from the end of an input string, which may not end at SENTENCEEND. I am wondering whether there are some special tokens like "^" and "$" in normal regular expression in RUTA.


Thanks in advance,


Baoli

Re: a Problem about WORDTABLE

Posted by Peter Klügl <pe...@averbis.com>.
Thanks, I will take a look at it today.


Best,


Peter


Am 05.07.2019 um 09:47 schrieb B. Li:
> Hi Peter,
>
>
> I sent it to your email box.
>
>
> Thanks a lot,
>
>
> Baoli
>
>
> On 7/5/2019 15:41,Peter Klügl<pe...@averbis.com> wrote:
> Hi,
>
>
> I think the attachment got lost. Can you either send it again to me
> email address or open a Jira ticket and attach the file there?
>
>
> Best,
>
>
> Peter
>
>
> Am 05.07.2019 um 09:37 schrieb B. Li:
> Thanks a lot Peter.
>
> Attached please find a CSV table encoded in UTF-8. Each row in the
> file contains a single Chinese digital character and its latin
> / mathematical value. I failed to get the value in the second column
> with the following RUTA script:
>
> WORDTABLE CnDigitTable = 'gZdd.csv';
> DECLARE Annotation CnD(STRING DVal);
> Document{-> MARKTABLE(CnD, 1, CnDigitTable, "DVal" = 2)};
>
> The type CnD with a feature DVal has been defined in the type
> descriptor XML file.
>
> I have upgraded the engine to the newest 2.7.0 version, but the
> problem is not solved.  Any suggestion? Thanks.
>
> Kind regards,
>
> Baoli
>
> On 7/5/2019 14:11,Peter Klügl<pe...@averbis.com>
> <ma...@averbis.com> wrote:
>
> Hi,
>
>
> most problems with the WordTable are caused by whitespaces in the
> dictionary. Can you test if this is your issue by removing all white
> spaces in the relevant column?
>
> If this is the source of the problem, there is a configuration
> parameter
> for automatically avoiding it, but I have to check in which version it
> was introduced. However, upgrading the Ruta version is recommended in
> any case.
>
>
> If this is not the source of your problem, do you have a minimal
> example
> for reproducing it?
>
>
> Best,
>
>
> Peter
>
>
>
> Am 05.07.2019 um 03:51 schrieb B. Li:
>
> Hi All,
>
>
> I am trying to use a WordTable to configure and give several
> different attribute values (with different columns) to some
> SINGLE (Chinese) characters, but I always fail to get the
> correct values from columns in the WordTable file, although
> the engine can correctly recognize and mark the SINGLE
> characters. I am using RUTA 2.4.0. How can I solve this
> problem? Any hint would be greatly appreciated!
>
>
> Thanks a lot,
>
>
> Baoli LI
>
>
> --
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
> --
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: a Problem about WORDTABLE

Posted by "B. Li" <cs...@126.com>.
Hi Peter,


I sent it to your email box.


Thanks a lot,


Baoli


On 7/5/2019 15:41,Peter Klügl<pe...@averbis.com> wrote:
Hi,


I think the attachment got lost. Can you either send it again to me
email address or open a Jira ticket and attach the file there?


Best,


Peter


Am 05.07.2019 um 09:37 schrieb B. Li:
Thanks a lot Peter.

Attached please find a CSV table encoded in UTF-8. Each row in the
file contains a single Chinese digital character and its latin
/ mathematical value. I failed to get the value in the second column
with the following RUTA script:

WORDTABLE CnDigitTable = 'gZdd.csv';
DECLARE Annotation CnD(STRING DVal);
Document{-> MARKTABLE(CnD, 1, CnDigitTable, "DVal" = 2)};

The type CnD with a feature DVal has been defined in the type
descriptor XML file.

I have upgraded the engine to the newest 2.7.0 version, but the
problem is not solved.  Any suggestion? Thanks.

Kind regards,

Baoli

On 7/5/2019 14:11,Peter Klügl<pe...@averbis.com>
<ma...@averbis.com> wrote:

Hi,


most problems with the WordTable are caused by whitespaces in the
dictionary. Can you test if this is your issue by removing all white
spaces in the relevant column?

If this is the source of the problem, there is a configuration
parameter
for automatically avoiding it, but I have to check in which version it
was introduced. However, upgrading the Ruta version is recommended in
any case.


If this is not the source of your problem, do you have a minimal
example
for reproducing it?


Best,


Peter



Am 05.07.2019 um 03:51 schrieb B. Li:

Hi All,


I am trying to use a WordTable to configure and give several
different attribute values (with different columns) to some
SINGLE (Chinese) characters, but I always fail to get the
correct values from columns in the WordTable file, although
the engine can correctly recognize and mark the SINGLE
characters. I am using RUTA 2.4.0. How can I solve this
problem? Any hint would be greatly appreciated!


Thanks a lot,


Baoli LI


--
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

--
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: a Problem about WORDTABLE

Posted by Peter Klügl <pe...@averbis.com>.
Hi,


I think the attachment got lost. Can you either send it again to me
email address or open a Jira ticket and attach the file there?


Best,


Peter


Am 05.07.2019 um 09:37 schrieb B. Li:
> Thanks a lot Peter.
>
> Attached please find a CSV table encoded in UTF-8. Each row in the
> file contains a single Chinese digital character and its latin
> / mathematical value. I failed to get the value in the second column
> with the following RUTA script:
>
> WORDTABLE CnDigitTable = 'gZdd.csv';
> DECLARE Annotation CnD(STRING DVal);
> Document{-> MARKTABLE(CnD, 1, CnDigitTable, "DVal" = 2)};
>
> The type CnD with a feature DVal has been defined in the type
> descriptor XML file.
>
> I have upgraded the engine to the newest 2.7.0 version, but the
> problem is not solved.  Any suggestion? Thanks.
>
> Kind regards,
>
> Baoli
>
> On 7/5/2019 14:11,Peter Klügl<pe...@averbis.com>
> <ma...@averbis.com> wrote:
>
>     Hi,
>
>
>     most problems with the WordTable are caused by whitespaces in the
>     dictionary. Can you test if this is your issue by removing all white
>     spaces in the relevant column?
>
>     If this is the source of the problem, there is a configuration
>     parameter
>     for automatically avoiding it, but I have to check in which version it
>     was introduced. However, upgrading the Ruta version is recommended in
>     any case.
>
>
>     If this is not the source of your problem, do you have a minimal
>     example
>     for reproducing it?
>
>
>     Best,
>
>
>     Peter
>
>
>
>     Am 05.07.2019 um 03:51 schrieb B. Li:
>
>         Hi All,
>
>
>         I am trying to use a WordTable to configure and give several
>         different attribute values (with different columns) to some
>         SINGLE (Chinese) characters, but I always fail to get the
>         correct values from columns in the WordTable file, although
>         the engine can correctly recognize and mark the SINGLE
>         characters. I am using RUTA 2.4.0. How can I solve this
>         problem? Any hint would be greatly appreciated!
>
>
>         Thanks a lot,
>
>
>         Baoli LI
>
>
>     -- 
>     Dr. Peter Klügl
>     R&D Text Mining/Machine Learning
>
>     Averbis GmbH
>     Salzstr. 15
>     79098 Freiburg
>     Germany
>
>     Fon: +49 761 708 394 0
>     Fax: +49 761 708 394 10
>     Email: peter.kluegl@averbis.com
>     Web: https://averbis.com
>
>     Headquarters: Freiburg im Breisgau
>     Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>     Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: a Problem about WORDTABLE

Posted by "B. Li" <cs...@126.com>.
Thanks a lot Peter.


Attached please find a CSV table encoded in UTF-8. Each row in the file contains a single Chinese digital character and its latin / mathematical value. I failed to get the value in the second column with the following RUTA script:


WORDTABLE CnDigitTable = 'gZdd.csv';
DECLARE Annotation CnD(STRING DVal);
Document{-> MARKTABLE(CnD, 1, CnDigitTable, "DVal" = 2)};


The type CnD with a feature DVal has been defined in the type descriptor XML file.


I have upgraded the engine to the newest 2.7.0 version, but the problem is not solved.  Any suggestion? Thanks.


Kind regards,


Baoli


On 7/5/2019 14:11,Peter Klügl<pe...@averbis.com> wrote:
Hi,


most problems with the WordTable are caused by whitespaces in the
dictionary. Can you test if this is your issue by removing all white
spaces in the relevant column?

If this is the source of the problem, there is a configuration parameter
for automatically avoiding it, but I have to check in which version it
was introduced. However, upgrading the Ruta version is recommended in
any case.


If this is not the source of your problem, do you have a minimal example
for reproducing it?


Best,


Peter



Am 05.07.2019 um 03:51 schrieb B. Li:
Hi All,


I am trying to use a WordTable to configure and give several different attribute values (with different columns) to some SINGLE (Chinese) characters, but I always fail to get the correct values from columns in the WordTable file, although the engine can correctly recognize and mark the SINGLE characters. I am using RUTA 2.4.0. How can I solve this problem? Any hint would be greatly appreciated!


Thanks a lot,


Baoli LI

--
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: a Problem about WORDTABLE

Posted by Peter Klügl <pe...@averbis.com>.
Hi,


most problems with the WordTable are caused by whitespaces in the
dictionary. Can you test if this is your issue by removing all white
spaces in the relevant column?

If this is the source of the problem, there is a configuration parameter
for automatically avoiding it, but I have to check in which version it
was introduced. However, upgrading the Ruta version is recommended in
any case.


If this is not the source of your problem, do you have a minimal example
for reproducing it?


Best,


Peter



Am 05.07.2019 um 03:51 schrieb B. Li:
> Hi All,
>
>
> I am trying to use a WordTable to configure and give several different attribute values (with different columns) to some SINGLE (Chinese) characters, but I always fail to get the correct values from columns in the WordTable file, although the engine can correctly recognize and mark the SINGLE characters. I am using RUTA 2.4.0. How can I solve this problem? Any hint would be greatly appreciated!
>
>
> Thanks a lot,
>
>
> Baoli LI

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó