You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Umang <um...@newgen.co.in> on 2013/08/01 12:47:09 UTC

FW:

Hi Team,

I want to use Open NLP for detecting Invoice Number from Invoices. For eg.
PFA invoice from which I need to extract Invoice Number. How can I do this?

 

Regards,

 

Umang Sand

Newgen Software Technologies Ltd.

www.newgensoft.com <http://www.newgensoft.com/> 

Phone:- +91-120-6761000  Ext. :931

Mobile No. : +91- 9711135529

 


Disclaimer :- This e-mail and any attachment may contain confidential, proprietary or legally privileged information. If you are not the original intended recipient and have erroneously received this message, you are prohibited from using, copying, altering or disclosing the content of this message. Please delete it immediately and notify the sender. Newgen Software Technologies Ltd (NSTL)  accepts no responsibilities for loss or damage arising from the use of the information transmitted by this email including damages from virus and further acknowledges that no binding nature of the message shall be implied or assumed unless the sender does so expressly with due authority of NSTL.

Re: FW:

Posted by Lance Norskog <go...@gmail.com>.
Check out OpenRefine. It's a tool for designing data cleanup workflows.
http://openrefine.org/documentation

On 08/02/2013 04:12 AM, G G wrote:
> I agree that If your invoice numbers have a consistent pattern or patterns,
> regex is likely your best bet.
> furthermore, If the invoice "file" has a format of something like=
>
> text text text
> inv #: 1234345TUY
> more text more text
>
> then you can be specific and pull the regex from a particular part of the
> text with something like this
> (^inv#)(.*?)(\n|\r) and grab group 2 from the regex
> otherwise if your inv # is just floating in free text, then you will have
> something like this
> (?:.*?)([0-9]{7}[A-B]{3})(?:.*?) and grab all the matches from group 2
> (this regex is based on my fake inv# above).
> Don't take these examples literally, just helping with some ideas,
> typically for stuff like this I will construct a list of priorized regexes
> and run them all and the first match wins
> Mark G
>
>
> On Thu, Aug 1, 2013 at 9:08 AM, Jim <ji...@gmail.com> wrote:
>
>> if your invoices are all from the same source (therefore in the same
>> format) then maybe openNLP is a bit of an overkill. A simple regex should
>> do the job :)
>>
>> Jim
>>
>>
>>
>> On 01/08/13 11:47, Umang wrote:
>>
>>> Hi Team,
>>>
>>> I want to use Open NLP for detecting Invoice Number from Invoices. For
>>> eg. PFA invoice from which I need to extract Invoice Number. How can I do
>>> this?
>>>
>>> Regards,
>>>
>>> *Umang Sand*
>>>
>>> Newgen Software Technologies Ltd.
>>>
>>> www.newgensoft.com <http://www.newgensoft.com/>
>>>
>>>
>>> Phone:- +91-120-6761000  Ext. :931
>>>
>>> Mobile No. : +91- 9711135529
>>>
>>>
>>>              Disclaimer :- This e-mail and any attachment may contain
>>>              confidential, proprietary or legally privileged
>>>              information. If you are not the original intended
>>>              recipient and have erroneously received this message, you
>>>              are prohibited from using, copying, altering or disclosing
>>>              the content of this message. Please delete it immediately
>>>              and notify the sender. Newgen Software Technologies Ltd
>>>              (NSTL) accepts no responsibilities for loss or damage
>>>              arising from the use of the information transmitted by
>>>              this email including damages from virus and further
>>>              acknowledges that no binding nature of the message shall
>>>              be implied or assumed unless the sender does so expressly
>>>              with due authority of NSTL.
>>>
>>>
>>>


Re: FW:

Posted by G G <gi...@gmail.com>.
I agree that If your invoice numbers have a consistent pattern or patterns,
regex is likely your best bet.
furthermore, If the invoice "file" has a format of something like=

text text text
inv #: 1234345TUY
more text more text

then you can be specific and pull the regex from a particular part of the
text with something like this
(^inv#)(.*?)(\n|\r) and grab group 2 from the regex
otherwise if your inv # is just floating in free text, then you will have
something like this
(?:.*?)([0-9]{7}[A-B]{3})(?:.*?) and grab all the matches from group 2
(this regex is based on my fake inv# above).
Don't take these examples literally, just helping with some ideas,
typically for stuff like this I will construct a list of priorized regexes
and run them all and the first match wins
Mark G


On Thu, Aug 1, 2013 at 9:08 AM, Jim <ji...@gmail.com> wrote:

> if your invoices are all from the same source (therefore in the same
> format) then maybe openNLP is a bit of an overkill. A simple regex should
> do the job :)
>
> Jim
>
>
>
> On 01/08/13 11:47, Umang wrote:
>
>>
>> Hi Team,
>>
>> I want to use Open NLP for detecting Invoice Number from Invoices. For
>> eg. PFA invoice from which I need to extract Invoice Number. How can I do
>> this?
>>
>> Regards,
>>
>> *Umang Sand*
>>
>> Newgen Software Technologies Ltd.
>>
>> www.newgensoft.com <http://www.newgensoft.com/>
>>
>>
>> Phone:- +91-120-6761000  Ext. :931
>>
>> Mobile No. : +91- 9711135529
>>
>>
>>             Disclaimer :- This e-mail and any attachment may contain
>>             confidential, proprietary or legally privileged
>>             information. If you are not the original intended
>>             recipient and have erroneously received this message, you
>>             are prohibited from using, copying, altering or disclosing
>>             the content of this message. Please delete it immediately
>>             and notify the sender. Newgen Software Technologies Ltd
>>             (NSTL) accepts no responsibilities for loss or damage
>>             arising from the use of the information transmitted by
>>             this email including damages from virus and further
>>             acknowledges that no binding nature of the message shall
>>             be implied or assumed unless the sender does so expressly
>>             with due authority of NSTL.
>>
>>
>>
>

Re: FW:

Posted by Jim <ji...@gmail.com>.
if your invoices are all from the same source (therefore in the same 
format) then maybe openNLP is a bit of an overkill. A simple regex 
should do the job :)

Jim


On 01/08/13 11:47, Umang wrote:
>
> Hi Team,
>
> I want to use Open NLP for detecting Invoice Number from Invoices. For 
> eg. PFA invoice from which I need to extract Invoice Number. How can I 
> do this?
>
> Regards,
>
> *Umang Sand*
>
> Newgen Software Technologies Ltd.
>
> www.newgensoft.com <http://www.newgensoft.com/>
>
> Phone:- +91-120-6761000  Ext. :931
>
> Mobile No. : +91- 9711135529
>
>
>             Disclaimer :- This e-mail and any attachment may contain
>             confidential, proprietary or legally privileged
>             information. If you are not the original intended
>             recipient and have erroneously received this message, you
>             are prohibited from using, copying, altering or disclosing
>             the content of this message. Please delete it immediately
>             and notify the sender. Newgen Software Technologies Ltd
>             (NSTL) accepts no responsibilities for loss or damage
>             arising from the use of the information transmitted by
>             this email including damages from virus and further
>             acknowledges that no binding nature of the message shall
>             be implied or assumed unless the sender does so expressly
>             with due authority of NSTL.
>
>