You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "Mahathi Aguvaveedi Aguvaveedi (US - Advisory)" <ma...@pwc.com> on 2016/02/16 10:56:04 UTC

matching words between breaks after a pincode

Hello,

I have the following sample text:

zip 20193
New York
USA

What I would like to do, is match only "New York" i.e., the line after the
zipcode.

I tried using this code but it is not working -

DECLARE heading;
pin BREAK #{-> MARK(heading)} BREAK;

(I have declared pin before this).

Please let me know how to go about this.

Thanks!

-- 
*Mahathi Aguvaveedi*

PwC | Analyics Advisory
Mobile: +91 9880438800
Email: mahathi.aguvaveedi@pwc.com
PricewaterhouseCoopers LLP
Hulkul Brigade Centre, Lavelle Road, Bangalore
http://www.pwc.com/us

______________________________________________________________________
The information transmitted, including any attachments, is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you received this in error, please contact the sender and delete the material from any computer. PricewaterhouseCoopers LLP is a Delaware limited liability partnership.  This communication may come from PricewaterhouseCoopers LLP or one of its subsidiaries.

Re: matching words between breaks after a pincode

Posted by Peter Klügl <pe...@averbis.com>.
Hi,

the problem is probably the filtering setting. BREAK is by default not 
visible. It will never be a successful match because ruta will 
automatically skip the line breaks.

Try to add another rule changing the filtering setting in front of your 
rule:
RETAINTYPE(BREAK);
pin BREAK #{-> MARK(heading)} BREAK;

There could be another problem because BREAK represents \n and \r. Thus, 
the rule would not work for windows line endings. You would need 
something like:
pin BREAK[1,2] #{-> MARK(heading)} BREAK;

There is a utils analysis engine in ruta for annotating lines: 
PlainTextAnnotator
If you include it, you can write something like:
pin Line{-> heading};

(You maybe need to trim the Lines, e.g.,  with the TRIM action if the 
lines start or end with whitespaces)

Best,

Peter

PS:
I recommend the usage of the explain view of the Ruta Workbench. There 
you can see why a rule did not match. The Workbench can considerably 
increase the efficiency and speed of developing ruta-based annotators.




Am 16.02.2016 um 10:56 schrieb Mahathi Aguvaveedi Aguvaveedi (US - 
Advisory):
> Hello,
>
> I have the following sample text:
>
> zip 20193
> New York
> USA
>
> What I would like to do, is match only "New York" i.e., the line after the
> zipcode.
>
> I tried using this code but it is not working -
>
> DECLARE heading;
> pin BREAK #{-> MARK(heading)} BREAK;
>
> (I have declared pin before this).
>
> Please let me know how to go about this.
>
> Thanks!
>