You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Jayani Withanawasam <ja...@gmail.com> on 2013/11/25 12:00:28 UTC

STANBOL-1209: Temporal expression extraction engine for Stanbol

Hi,

I'm researching on adding new enhancement engine for extracting date and
time (Temporal extraction) to Stanbol as suggested by Rupert.

There, it is being found that OpenNLP has an entity extraction unit for
date and time.
Also, I noticed that OpenNLP is already integrated to Stanbol in NER engine.

So, as per my understanding, there are two options to extract date and time.

One is to have a seperate enhancement engine for date and time information
extraction. Another one is to add date time extraction as a code
enhancement to exisitng OpenNLP NER engine.

What is your opinion on this? Is there any other approach which you think
that would be better?

Thank you
Jayani

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Jayani,

I think [1] has a good list of regex pattern to start from

Note that day/month names are language specific. So If we want to have
support for those we would need to create a dictionary and select the
right options based on the language detected for the text.

best
Rupert

[1] http://regexlib.com/DisplayPatterns.aspx?cattabindex=4&categoryId=5&AspxAutoDetectCookieSupport=1

On Mon, Jan 27, 2014 at 1:50 PM, Jayani Withanawasam
<ja...@gmail.com> wrote:
> Thank you Antonio
>
> Hi all,
>
> I have done a bit of research on this task and I need your opinion on
> "recognizing" temporal expressions from plain text.
> As per my understanding, 3 options are available to perform this task.
>
>
>    1. Statistical approach (E.g., Open NLP)
>    2. Rule based approach (linguistic grammar based APIs such as SUTime,
>    HeidelTime)
>    3. Simple regular expressions engine (simple temporal patterns)
>
>
> We already decided we will not proceed with option 1. Also, we will not go
> for option 2 as well due to license issue.
>
> So, with regard to option 3, there are few possible approaches to identify
> whether a given expression is a temporal expression.
>
> Year - numerical expression given as 4 digits of number with in specified
> time range (E.g., 1100 - 2500)
> Month - Jan, January.., (1-12)
> Date - 1-31
> Day - Monday, Tuesday...
> Time - a.m., p.m.
>
> Also, up to some extent we can infer temporal expressions based on the time
> related prepositions such as "on, in, at, since etc."
>
> Do you think the above approach will provide us sufficient results for the
> baseline implementation? Or do we need more advanced approach, for example
> our own rule engine/grammar for date time extraction?
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jan 27, 2014 at 1:31 PM, Antonio David Perez Morales <
> aperez@zaizi.com> wrote:
>
>> Hi Jayani
>>
>> Perfect. I can help you if you want in the implementation of this engine or
>> in questions about the classes used in the Enhancement Engine or about
>> OSGI.
>>
>> Feel free to ask.
>>
>> Regards
>>
>>
>> On Mon, Jan 27, 2014 at 8:13 AM, Jayani Withanawasam <
>> jayaniwithanawasam@gmail.com> wrote:
>>
>> > Thank you Antonio and Rupert for your clarifications.
>> >
>> > So, we need to work on a date time extraction engine from the scratch
>> (with
>> > out using any of the mentioned third party libraries) as the base line
>> > implementation.
>> >
>> > We will implement other possible approaches as advanced features later.
>> > Correct me if I'm wrong. I'm working on this and will keep posted on the
>> > progress.
>> >
>> >
>> >
>> > On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler <
>> > rupert.westenthaler@gmail.com> wrote:
>> >
>> > > Hi Jayani, Antonio,
>> > >
>> > > With "base-line" I mean, that it is IMHO important to have a
>> > > functionality also present in the default distribution of Stanbol.
>> > > With a Regex based solution this is possible. With implementations
>> > > based on GPL licensed projects it is not.
>> > >
>> > > Having a "base-line" implementation would allow to start users with
>> > > the Regex based DateExtractionEngine and if this one does not fit the
>> > > requirements they would look for alternatives and find advanced
>> > > options that would require them do manually download and install
>> > > additional GPL licensed software.
>> > >
>> > > best
>> > > Rupert
>> > >
>> > >
>> > > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales
>> > > <ap...@zaizi.com> wrote:
>> > > > Hi Jayani
>> > > >
>> > > > What Rupert means is that it would be good to have a "RegEx"
>> > Enhancement
>> > > > Engine which extracts/creates TextAnnotations based on regular
>> > > expressions
>> > > > configured in the engine.
>> > > > This way you can configure one engine of this type and provide a
>> > regular
>> > > > expression for extract dates and times.
>> > > >
>> > > > After that, we can take a look at the projects pointed out by Rupert
>> in
>> > > > order to be integrated in Stanbol.
>> > > >
>> > > > Regards
>> > > >
>> > > >
>> > > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam <
>> > > > jayaniwithanawasam@gmail.com> wrote:
>> > > >
>> > > >> Thank you Rupert and Anuj for your suggestions. I'm going through
>> the
>> > > links
>> > > >> you have provided.
>> > > >>
>> > > >> Rupert,
>> > > >>
>> > > >> What did you mean by base-line engine that is directly integrated in
>> > > >> Stanbol with Regex based approach?
>> > > >>
>> > > >> Appreciate if you can further elaborate this.
>> > > >>
>> > > >>
>> > > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
>> > > >> rupert.westenthaler@gmail.com> wrote:
>> > > >>
>> > > >> > Hi Anuj
>> > > >> >
>> > > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com>
>> > > wrote:
>> > > >> > > I second that. Regex will work better w.r.t. the default trained
>> > > model
>> > > >> of
>> > > >> > > OpenNLP.
>> > > >> >
>> > > >> > Both such projects do look interesting:
>> > > >> >
>> > > >> > > Also, take a look at this extractor-
>> > > >> > https://code.google.com/p/heideltime/ and
>> > > >> >
>> > > >> > As this is GPLv3 you can not directly use it to implement an
>> > > >> > EnhancementEngine that is part of the Stanbol Codebase.
>> Integrating
>> > it
>> > > >> > via a RESTful service would be an option.
>> > > >> >
>> > > >> > > Stanford's tagger-
>> > http://nlp.stanford.edu/downloads/sutime.shtml#!
>> > > >> >
>> > > >> > The same is true for SuTime as all Stanford NLP components are
>> under
>> > > GPL.
>> > > >> >
>> > > >> > If we want to integrate those projects I suggest to extend the
>> > Stanbol
>> > > >> > RESTful NLP protocol [1] and service [2] so that it can represent
>> > > >> > date/time points and ranges. SuTime support could be added to the
>> > > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime
>> > one
>> > > >> > would need to implement a similar component.
>> > > >> >
>> > > >> >
>> > > >> > But before integrating those I would prefer to have a base-line
>> > engine
>> > > >> > that is directly integrated in Stanbol. Looks like a Regex based
>> > > >> > approach could be sufficient for that. WDYT Jayani?
>> > > >> >
>> > > >> > best
>> > > >> > Rupert
>> > > >> >
>> > > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878
>> > > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892
>> > > >> > [3] https://github.com/westei/stanbol-stanfordnlp
>> > > >> >
>> > > >> > >
>> > > >> > > It will be useful to have similar temporal expression
>> enhancement
>> > > >> engine
>> > > >> > in
>> > > >> > > Stanbol.
>> > > >> > >
>> > > >> > > Regards,
>> > > >> > > Anuj
>> > > >> > >
>> > > >> > >
>> > > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
>> > > >> > > rupert.westenthaler@gmail.com> wrote:
>> > > >> > >
>> > > >> > >> Hi Jayani,
>> > > >> > >>
>> > > >> > >> I was not even aware that there exists a Time model for
>> OpenNLP.
>> > > >> > >> Documentation shows that this uses a purely statistical model
>> so
>> > I
>> > > am
>> > > >> > >> wondering about the quality. Note also that OpenNLP only
>> > provides a
>> > > >> > >> prebuilt model for English [1].
>> > > >> > >>
>> > > >> > >> AFAIK OpenNLP will only provide you with the information that
>> > some
>> > > >> > >> tokens do represent a date. It will not provide you the parsed
>> > > >> > >> xsd:dateTime. So if you use this Engine you will still need to
>> > > >> > >> implement this part of your own. So most likely you will end up
>> > > using
>> > > >> > >> regex patterns to parse the actual time from the Tokens marked
>> by
>> > > >> > >> OpenNLP as time.
>> > > >> > >>
>> > > >> > >> So I am wondering if it is not better to start with Regex from
>> > the
>> > > >> > >> beginning. If you search for "Regey Date Time extraction" you
>> can
>> > > >> > >> fined a huge set of example you could start from.
>> > > >> > >>
>> > > >> > >> best
>> > > >> > >> Rupert
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
>> > > >> > >> <ja...@gmail.com> wrote:
>> > > >> > >> > Hi Dileepa,
>> > > >> > >> >
>> > > >> > >> > Thank you so much for your valuble feedback. I'm working on
>> > this.
>> > > >> > >> >
>> > > >> > >> >
>> > > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
>> > > >> > >> dileepajayakody@gmail.com
>> > > >> > >> >> wrote:
>> > > >> > >> >
>> > > >> > >> >> Hi Jayani,
>> > > >> > >> >>
>> > > >> > >> >> There are several enhancement engines in Stanbol developed
>> > > based on
>> > > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See
>> > [1])
>> > > >> >  Each of
>> > > >> > >> >> these engines focus on a particular enhancement aspect using
>> > > >> OpenNLP.
>> > > >> > >> >> Therefore I think it's better to write a new engine for
>> > temporal
>> > > >> > >> >> extractions rather than extending the OpenNLP-NER engine.
>> > > >> > >> >>
>> > > >> > >> >> Thanks,
>> > > >> > >> >> Dileepa
>> > > >> > >> >>
>> > > >> > >> >> [1]
>> > > >> > >> >>
>> > > >> > >>
>> > > >> >
>> > > >>
>> > >
>> >
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
>> > > >> > >> >>
>> > > >> > >> >>
>> > > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
>> > > >> > >> >> jayaniwithanawasam@gmail.com> wrote:
>> > > >> > >> >>
>> > > >> > >> >> > Hi,
>> > > >> > >> >> >
>> > > >> > >> >> > I'm researching on adding new enhancement engine for
>> > > extracting
>> > > >> > date
>> > > >> > >> and
>> > > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by
>> > Rupert.
>> > > >> > >> >> >
>> > > >> > >> >> > There, it is being found that OpenNLP has an entity
>> > extraction
>> > > >> unit
>> > > >> > >> for
>> > > >> > >> >> > date and time.
>> > > >> > >> >> > Also, I noticed that OpenNLP is already integrated to
>> > Stanbol
>> > > in
>> > > >> > NER
>> > > >> > >> >> > engine.
>> > > >> > >> >> >
>> > > >> > >> >> > So, as per my understanding, there are two options to
>> > extract
>> > > >> date
>> > > >> > and
>> > > >> > >> >> > time.
>> > > >> > >> >> >
>> > > >> > >> >> > One is to have a seperate enhancement engine for date and
>> > time
>> > > >> > >> >> information
>> > > >> > >> >> > extraction. Another one is to add date time extraction as
>> a
>> > > code
>> > > >> > >> >> > enhancement to exisitng OpenNLP NER engine.
>> > > >> > >> >> >
>> > > >> > >> >> > What is your opinion on this? Is there any other approach
>> > > which
>> > > >> you
>> > > >> > >> think
>> > > >> > >> >> > that would be better?
>> > > >> > >> >> >
>> > > >> > >> >> > Thank you
>> > > >> > >> >> > Jayani
>> > > >> > >> >> >
>> > > >> > >> >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> --
>> > > >> > >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> > > >> > >> | Bodenlehenstraße 11
>> > > ++43-699-11108907
>> > > >> > >> | A-5500 Bischofshofen
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > --
>> > > >> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> > > >> > | Bodenlehenstraße 11
>> ++43-699-11108907
>> > > >> > | A-5500 Bischofshofen
>> > > >> >
>> > > >>
>> > > >
>> > > > --
>> > > >
>> > > > ------------------------------
>> > > > This message should be regarded as confidential. If you have received
>> > > this
>> > > > email in error please notify the sender and destroy it immediately.
>> > > > Statements of intent shall only become binding when confirmed in hard
>> > > copy
>> > > > by an authorised signatory.
>> > > >
>> > > > Zaizi Ltd is registered in England and Wales with the registration
>> > number
>> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> Road,
>> > > > London W6 7AN.
>> > >
>> > >
>> > >
>> > > --
>> > > | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> > > | Bodenlehenstraße 11                             ++43-699-11108907
>> > > | A-5500 Bischofshofen
>> > >
>> >
>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Jayani Withanawasam <ja...@gmail.com>.
Thank you Antonio

Hi all,

I have done a bit of research on this task and I need your opinion on
"recognizing" temporal expressions from plain text.
As per my understanding, 3 options are available to perform this task.


   1. Statistical approach (E.g., Open NLP)
   2. Rule based approach (linguistic grammar based APIs such as SUTime,
   HeidelTime)
   3. Simple regular expressions engine (simple temporal patterns)


We already decided we will not proceed with option 1. Also, we will not go
for option 2 as well due to license issue.

So, with regard to option 3, there are few possible approaches to identify
whether a given expression is a temporal expression.

Year - numerical expression given as 4 digits of number with in specified
time range (E.g., 1100 - 2500)
Month - Jan, January.., (1-12)
Date - 1-31
Day - Monday, Tuesday...
Time - a.m., p.m.

Also, up to some extent we can infer temporal expressions based on the time
related prepositions such as "on, in, at, since etc."

Do you think the above approach will provide us sufficient results for the
baseline implementation? Or do we need more advanced approach, for example
our own rule engine/grammar for date time extraction?











On Mon, Jan 27, 2014 at 1:31 PM, Antonio David Perez Morales <
aperez@zaizi.com> wrote:

> Hi Jayani
>
> Perfect. I can help you if you want in the implementation of this engine or
> in questions about the classes used in the Enhancement Engine or about
> OSGI.
>
> Feel free to ask.
>
> Regards
>
>
> On Mon, Jan 27, 2014 at 8:13 AM, Jayani Withanawasam <
> jayaniwithanawasam@gmail.com> wrote:
>
> > Thank you Antonio and Rupert for your clarifications.
> >
> > So, we need to work on a date time extraction engine from the scratch
> (with
> > out using any of the mentioned third party libraries) as the base line
> > implementation.
> >
> > We will implement other possible approaches as advanced features later.
> > Correct me if I'm wrong. I'm working on this and will keep posted on the
> > progress.
> >
> >
> >
> > On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> > > Hi Jayani, Antonio,
> > >
> > > With "base-line" I mean, that it is IMHO important to have a
> > > functionality also present in the default distribution of Stanbol.
> > > With a Regex based solution this is possible. With implementations
> > > based on GPL licensed projects it is not.
> > >
> > > Having a "base-line" implementation would allow to start users with
> > > the Regex based DateExtractionEngine and if this one does not fit the
> > > requirements they would look for alternatives and find advanced
> > > options that would require them do manually download and install
> > > additional GPL licensed software.
> > >
> > > best
> > > Rupert
> > >
> > >
> > > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales
> > > <ap...@zaizi.com> wrote:
> > > > Hi Jayani
> > > >
> > > > What Rupert means is that it would be good to have a "RegEx"
> > Enhancement
> > > > Engine which extracts/creates TextAnnotations based on regular
> > > expressions
> > > > configured in the engine.
> > > > This way you can configure one engine of this type and provide a
> > regular
> > > > expression for extract dates and times.
> > > >
> > > > After that, we can take a look at the projects pointed out by Rupert
> in
> > > > order to be integrated in Stanbol.
> > > >
> > > > Regards
> > > >
> > > >
> > > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam <
> > > > jayaniwithanawasam@gmail.com> wrote:
> > > >
> > > >> Thank you Rupert and Anuj for your suggestions. I'm going through
> the
> > > links
> > > >> you have provided.
> > > >>
> > > >> Rupert,
> > > >>
> > > >> What did you mean by base-line engine that is directly integrated in
> > > >> Stanbol with Regex based approach?
> > > >>
> > > >> Appreciate if you can further elaborate this.
> > > >>
> > > >>
> > > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
> > > >> rupert.westenthaler@gmail.com> wrote:
> > > >>
> > > >> > Hi Anuj
> > > >> >
> > > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com>
> > > wrote:
> > > >> > > I second that. Regex will work better w.r.t. the default trained
> > > model
> > > >> of
> > > >> > > OpenNLP.
> > > >> >
> > > >> > Both such projects do look interesting:
> > > >> >
> > > >> > > Also, take a look at this extractor-
> > > >> > https://code.google.com/p/heideltime/ and
> > > >> >
> > > >> > As this is GPLv3 you can not directly use it to implement an
> > > >> > EnhancementEngine that is part of the Stanbol Codebase.
> Integrating
> > it
> > > >> > via a RESTful service would be an option.
> > > >> >
> > > >> > > Stanford's tagger-
> > http://nlp.stanford.edu/downloads/sutime.shtml#!
> > > >> >
> > > >> > The same is true for SuTime as all Stanford NLP components are
> under
> > > GPL.
> > > >> >
> > > >> > If we want to integrate those projects I suggest to extend the
> > Stanbol
> > > >> > RESTful NLP protocol [1] and service [2] so that it can represent
> > > >> > date/time points and ranges. SuTime support could be added to the
> > > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime
> > one
> > > >> > would need to implement a similar component.
> > > >> >
> > > >> >
> > > >> > But before integrating those I would prefer to have a base-line
> > engine
> > > >> > that is directly integrated in Stanbol. Looks like a Regex based
> > > >> > approach could be sufficient for that. WDYT Jayani?
> > > >> >
> > > >> > best
> > > >> > Rupert
> > > >> >
> > > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878
> > > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892
> > > >> > [3] https://github.com/westei/stanbol-stanfordnlp
> > > >> >
> > > >> > >
> > > >> > > It will be useful to have similar temporal expression
> enhancement
> > > >> engine
> > > >> > in
> > > >> > > Stanbol.
> > > >> > >
> > > >> > > Regards,
> > > >> > > Anuj
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
> > > >> > > rupert.westenthaler@gmail.com> wrote:
> > > >> > >
> > > >> > >> Hi Jayani,
> > > >> > >>
> > > >> > >> I was not even aware that there exists a Time model for
> OpenNLP.
> > > >> > >> Documentation shows that this uses a purely statistical model
> so
> > I
> > > am
> > > >> > >> wondering about the quality. Note also that OpenNLP only
> > provides a
> > > >> > >> prebuilt model for English [1].
> > > >> > >>
> > > >> > >> AFAIK OpenNLP will only provide you with the information that
> > some
> > > >> > >> tokens do represent a date. It will not provide you the parsed
> > > >> > >> xsd:dateTime. So if you use this Engine you will still need to
> > > >> > >> implement this part of your own. So most likely you will end up
> > > using
> > > >> > >> regex patterns to parse the actual time from the Tokens marked
> by
> > > >> > >> OpenNLP as time.
> > > >> > >>
> > > >> > >> So I am wondering if it is not better to start with Regex from
> > the
> > > >> > >> beginning. If you search for "Regey Date Time extraction" you
> can
> > > >> > >> fined a huge set of example you could start from.
> > > >> > >>
> > > >> > >> best
> > > >> > >> Rupert
> > > >> > >>
> > > >> > >>
> > > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
> > > >> > >> <ja...@gmail.com> wrote:
> > > >> > >> > Hi Dileepa,
> > > >> > >> >
> > > >> > >> > Thank you so much for your valuble feedback. I'm working on
> > this.
> > > >> > >> >
> > > >> > >> >
> > > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
> > > >> > >> dileepajayakody@gmail.com
> > > >> > >> >> wrote:
> > > >> > >> >
> > > >> > >> >> Hi Jayani,
> > > >> > >> >>
> > > >> > >> >> There are several enhancement engines in Stanbol developed
> > > based on
> > > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See
> > [1])
> > > >> >  Each of
> > > >> > >> >> these engines focus on a particular enhancement aspect using
> > > >> OpenNLP.
> > > >> > >> >> Therefore I think it's better to write a new engine for
> > temporal
> > > >> > >> >> extractions rather than extending the OpenNLP-NER engine.
> > > >> > >> >>
> > > >> > >> >> Thanks,
> > > >> > >> >> Dileepa
> > > >> > >> >>
> > > >> > >> >> [1]
> > > >> > >> >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
> > > >> > >> >> jayaniwithanawasam@gmail.com> wrote:
> > > >> > >> >>
> > > >> > >> >> > Hi,
> > > >> > >> >> >
> > > >> > >> >> > I'm researching on adding new enhancement engine for
> > > extracting
> > > >> > date
> > > >> > >> and
> > > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by
> > Rupert.
> > > >> > >> >> >
> > > >> > >> >> > There, it is being found that OpenNLP has an entity
> > extraction
> > > >> unit
> > > >> > >> for
> > > >> > >> >> > date and time.
> > > >> > >> >> > Also, I noticed that OpenNLP is already integrated to
> > Stanbol
> > > in
> > > >> > NER
> > > >> > >> >> > engine.
> > > >> > >> >> >
> > > >> > >> >> > So, as per my understanding, there are two options to
> > extract
> > > >> date
> > > >> > and
> > > >> > >> >> > time.
> > > >> > >> >> >
> > > >> > >> >> > One is to have a seperate enhancement engine for date and
> > time
> > > >> > >> >> information
> > > >> > >> >> > extraction. Another one is to add date time extraction as
> a
> > > code
> > > >> > >> >> > enhancement to exisitng OpenNLP NER engine.
> > > >> > >> >> >
> > > >> > >> >> > What is your opinion on this? Is there any other approach
> > > which
> > > >> you
> > > >> > >> think
> > > >> > >> >> > that would be better?
> > > >> > >> >> >
> > > >> > >> >> > Thank you
> > > >> > >> >> > Jayani
> > > >> > >> >> >
> > > >> > >> >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> --
> > > >> > >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> > > >> > >> | Bodenlehenstraße 11
> > > ++43-699-11108907
> > > >> > >> | A-5500 Bischofshofen
> > > >> > >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > > >> > | Bodenlehenstraße 11
> ++43-699-11108907
> > > >> > | A-5500 Bischofshofen
> > > >> >
> > > >>
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > >
> > >
> > >
> > > --
> > > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > > | Bodenlehenstraße 11                             ++43-699-11108907
> > > | A-5500 Bischofshofen
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Antonio David Perez Morales <ap...@zaizi.com>.
Hi Jayani

Perfect. I can help you if you want in the implementation of this engine or
in questions about the classes used in the Enhancement Engine or about OSGI.

Feel free to ask.

Regards


On Mon, Jan 27, 2014 at 8:13 AM, Jayani Withanawasam <
jayaniwithanawasam@gmail.com> wrote:

> Thank you Antonio and Rupert for your clarifications.
>
> So, we need to work on a date time extraction engine from the scratch (with
> out using any of the mentioned third party libraries) as the base line
> implementation.
>
> We will implement other possible approaches as advanced features later.
> Correct me if I'm wrong. I'm working on this and will keep posted on the
> progress.
>
>
>
> On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
> > Hi Jayani, Antonio,
> >
> > With "base-line" I mean, that it is IMHO important to have a
> > functionality also present in the default distribution of Stanbol.
> > With a Regex based solution this is possible. With implementations
> > based on GPL licensed projects it is not.
> >
> > Having a "base-line" implementation would allow to start users with
> > the Regex based DateExtractionEngine and if this one does not fit the
> > requirements they would look for alternatives and find advanced
> > options that would require them do manually download and install
> > additional GPL licensed software.
> >
> > best
> > Rupert
> >
> >
> > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales
> > <ap...@zaizi.com> wrote:
> > > Hi Jayani
> > >
> > > What Rupert means is that it would be good to have a "RegEx"
> Enhancement
> > > Engine which extracts/creates TextAnnotations based on regular
> > expressions
> > > configured in the engine.
> > > This way you can configure one engine of this type and provide a
> regular
> > > expression for extract dates and times.
> > >
> > > After that, we can take a look at the projects pointed out by Rupert in
> > > order to be integrated in Stanbol.
> > >
> > > Regards
> > >
> > >
> > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam <
> > > jayaniwithanawasam@gmail.com> wrote:
> > >
> > >> Thank you Rupert and Anuj for your suggestions. I'm going through the
> > links
> > >> you have provided.
> > >>
> > >> Rupert,
> > >>
> > >> What did you mean by base-line engine that is directly integrated in
> > >> Stanbol with Regex based approach?
> > >>
> > >> Appreciate if you can further elaborate this.
> > >>
> > >>
> > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
> > >> rupert.westenthaler@gmail.com> wrote:
> > >>
> > >> > Hi Anuj
> > >> >
> > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com>
> > wrote:
> > >> > > I second that. Regex will work better w.r.t. the default trained
> > model
> > >> of
> > >> > > OpenNLP.
> > >> >
> > >> > Both such projects do look interesting:
> > >> >
> > >> > > Also, take a look at this extractor-
> > >> > https://code.google.com/p/heideltime/ and
> > >> >
> > >> > As this is GPLv3 you can not directly use it to implement an
> > >> > EnhancementEngine that is part of the Stanbol Codebase. Integrating
> it
> > >> > via a RESTful service would be an option.
> > >> >
> > >> > > Stanford's tagger-
> http://nlp.stanford.edu/downloads/sutime.shtml#!
> > >> >
> > >> > The same is true for SuTime as all Stanford NLP components are under
> > GPL.
> > >> >
> > >> > If we want to integrate those projects I suggest to extend the
> Stanbol
> > >> > RESTful NLP protocol [1] and service [2] so that it can represent
> > >> > date/time points and ranges. SuTime support could be added to the
> > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime
> one
> > >> > would need to implement a similar component.
> > >> >
> > >> >
> > >> > But before integrating those I would prefer to have a base-line
> engine
> > >> > that is directly integrated in Stanbol. Looks like a Regex based
> > >> > approach could be sufficient for that. WDYT Jayani?
> > >> >
> > >> > best
> > >> > Rupert
> > >> >
> > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878
> > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892
> > >> > [3] https://github.com/westei/stanbol-stanfordnlp
> > >> >
> > >> > >
> > >> > > It will be useful to have similar temporal expression enhancement
> > >> engine
> > >> > in
> > >> > > Stanbol.
> > >> > >
> > >> > > Regards,
> > >> > > Anuj
> > >> > >
> > >> > >
> > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
> > >> > > rupert.westenthaler@gmail.com> wrote:
> > >> > >
> > >> > >> Hi Jayani,
> > >> > >>
> > >> > >> I was not even aware that there exists a Time model for OpenNLP.
> > >> > >> Documentation shows that this uses a purely statistical model so
> I
> > am
> > >> > >> wondering about the quality. Note also that OpenNLP only
> provides a
> > >> > >> prebuilt model for English [1].
> > >> > >>
> > >> > >> AFAIK OpenNLP will only provide you with the information that
> some
> > >> > >> tokens do represent a date. It will not provide you the parsed
> > >> > >> xsd:dateTime. So if you use this Engine you will still need to
> > >> > >> implement this part of your own. So most likely you will end up
> > using
> > >> > >> regex patterns to parse the actual time from the Tokens marked by
> > >> > >> OpenNLP as time.
> > >> > >>
> > >> > >> So I am wondering if it is not better to start with Regex from
> the
> > >> > >> beginning. If you search for "Regey Date Time extraction" you can
> > >> > >> fined a huge set of example you could start from.
> > >> > >>
> > >> > >> best
> > >> > >> Rupert
> > >> > >>
> > >> > >>
> > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
> > >> > >> <ja...@gmail.com> wrote:
> > >> > >> > Hi Dileepa,
> > >> > >> >
> > >> > >> > Thank you so much for your valuble feedback. I'm working on
> this.
> > >> > >> >
> > >> > >> >
> > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
> > >> > >> dileepajayakody@gmail.com
> > >> > >> >> wrote:
> > >> > >> >
> > >> > >> >> Hi Jayani,
> > >> > >> >>
> > >> > >> >> There are several enhancement engines in Stanbol developed
> > based on
> > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See
> [1])
> > >> >  Each of
> > >> > >> >> these engines focus on a particular enhancement aspect using
> > >> OpenNLP.
> > >> > >> >> Therefore I think it's better to write a new engine for
> temporal
> > >> > >> >> extractions rather than extending the OpenNLP-NER engine.
> > >> > >> >>
> > >> > >> >> Thanks,
> > >> > >> >> Dileepa
> > >> > >> >>
> > >> > >> >> [1]
> > >> > >> >>
> > >> > >>
> > >> >
> > >>
> >
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
> > >> > >> >>
> > >> > >> >>
> > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
> > >> > >> >> jayaniwithanawasam@gmail.com> wrote:
> > >> > >> >>
> > >> > >> >> > Hi,
> > >> > >> >> >
> > >> > >> >> > I'm researching on adding new enhancement engine for
> > extracting
> > >> > date
> > >> > >> and
> > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by
> Rupert.
> > >> > >> >> >
> > >> > >> >> > There, it is being found that OpenNLP has an entity
> extraction
> > >> unit
> > >> > >> for
> > >> > >> >> > date and time.
> > >> > >> >> > Also, I noticed that OpenNLP is already integrated to
> Stanbol
> > in
> > >> > NER
> > >> > >> >> > engine.
> > >> > >> >> >
> > >> > >> >> > So, as per my understanding, there are two options to
> extract
> > >> date
> > >> > and
> > >> > >> >> > time.
> > >> > >> >> >
> > >> > >> >> > One is to have a seperate enhancement engine for date and
> time
> > >> > >> >> information
> > >> > >> >> > extraction. Another one is to add date time extraction as a
> > code
> > >> > >> >> > enhancement to exisitng OpenNLP NER engine.
> > >> > >> >> >
> > >> > >> >> > What is your opinion on this? Is there any other approach
> > which
> > >> you
> > >> > >> think
> > >> > >> >> > that would be better?
> > >> > >> >> >
> > >> > >> >> > Thank you
> > >> > >> >> > Jayani
> > >> > >> >> >
> > >> > >> >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> --
> > >> > >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > >> > >> | Bodenlehenstraße 11
> > ++43-699-11108907
> > >> > >> | A-5500 Bischofshofen
> > >> > >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > >> > | Bodenlehenstraße 11                             ++43-699-11108907
> > >> > | A-5500 Bischofshofen
> > >> >
> > >>
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> >
> >
> >
> > --
> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > | Bodenlehenstraße 11                             ++43-699-11108907
> > | A-5500 Bischofshofen
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Jayani Withanawasam <ja...@gmail.com>.
Thank you Antonio and Rupert for your clarifications.

So, we need to work on a date time extraction engine from the scratch (with
out using any of the mentioned third party libraries) as the base line
implementation.

We will implement other possible approaches as advanced features later.
Correct me if I'm wrong. I'm working on this and will keep posted on the
progress.



On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Jayani, Antonio,
>
> With "base-line" I mean, that it is IMHO important to have a
> functionality also present in the default distribution of Stanbol.
> With a Regex based solution this is possible. With implementations
> based on GPL licensed projects it is not.
>
> Having a "base-line" implementation would allow to start users with
> the Regex based DateExtractionEngine and if this one does not fit the
> requirements they would look for alternatives and find advanced
> options that would require them do manually download and install
> additional GPL licensed software.
>
> best
> Rupert
>
>
> On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales
> <ap...@zaizi.com> wrote:
> > Hi Jayani
> >
> > What Rupert means is that it would be good to have a "RegEx" Enhancement
> > Engine which extracts/creates TextAnnotations based on regular
> expressions
> > configured in the engine.
> > This way you can configure one engine of this type and provide a regular
> > expression for extract dates and times.
> >
> > After that, we can take a look at the projects pointed out by Rupert in
> > order to be integrated in Stanbol.
> >
> > Regards
> >
> >
> > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam <
> > jayaniwithanawasam@gmail.com> wrote:
> >
> >> Thank you Rupert and Anuj for your suggestions. I'm going through the
> links
> >> you have provided.
> >>
> >> Rupert,
> >>
> >> What did you mean by base-line engine that is directly integrated in
> >> Stanbol with Regex based approach?
> >>
> >> Appreciate if you can further elaborate this.
> >>
> >>
> >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
> >> rupert.westenthaler@gmail.com> wrote:
> >>
> >> > Hi Anuj
> >> >
> >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com>
> wrote:
> >> > > I second that. Regex will work better w.r.t. the default trained
> model
> >> of
> >> > > OpenNLP.
> >> >
> >> > Both such projects do look interesting:
> >> >
> >> > > Also, take a look at this extractor-
> >> > https://code.google.com/p/heideltime/ and
> >> >
> >> > As this is GPLv3 you can not directly use it to implement an
> >> > EnhancementEngine that is part of the Stanbol Codebase. Integrating it
> >> > via a RESTful service would be an option.
> >> >
> >> > > Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#!
> >> >
> >> > The same is true for SuTime as all Stanford NLP components are under
> GPL.
> >> >
> >> > If we want to integrate those projects I suggest to extend the Stanbol
> >> > RESTful NLP protocol [1] and service [2] so that it can represent
> >> > date/time points and ranges. SuTime support could be added to the
> >> > already existing Stanbol-Stanford integration [3]. For HeidelTime one
> >> > would need to implement a similar component.
> >> >
> >> >
> >> > But before integrating those I would prefer to have a base-line engine
> >> > that is directly integrated in Stanbol. Looks like a Regex based
> >> > approach could be sufficient for that. WDYT Jayani?
> >> >
> >> > best
> >> > Rupert
> >> >
> >> > [1] https://issues.apache.org/jira/browse/STANBOL-878
> >> > [2] https://issues.apache.org/jira/browse/STANBOL-892
> >> > [3] https://github.com/westei/stanbol-stanfordnlp
> >> >
> >> > >
> >> > > It will be useful to have similar temporal expression enhancement
> >> engine
> >> > in
> >> > > Stanbol.
> >> > >
> >> > > Regards,
> >> > > Anuj
> >> > >
> >> > >
> >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
> >> > > rupert.westenthaler@gmail.com> wrote:
> >> > >
> >> > >> Hi Jayani,
> >> > >>
> >> > >> I was not even aware that there exists a Time model for OpenNLP.
> >> > >> Documentation shows that this uses a purely statistical model so I
> am
> >> > >> wondering about the quality. Note also that OpenNLP only provides a
> >> > >> prebuilt model for English [1].
> >> > >>
> >> > >> AFAIK OpenNLP will only provide you with the information that some
> >> > >> tokens do represent a date. It will not provide you the parsed
> >> > >> xsd:dateTime. So if you use this Engine you will still need to
> >> > >> implement this part of your own. So most likely you will end up
> using
> >> > >> regex patterns to parse the actual time from the Tokens marked by
> >> > >> OpenNLP as time.
> >> > >>
> >> > >> So I am wondering if it is not better to start with Regex from the
> >> > >> beginning. If you search for "Regey Date Time extraction" you can
> >> > >> fined a huge set of example you could start from.
> >> > >>
> >> > >> best
> >> > >> Rupert
> >> > >>
> >> > >>
> >> > >> [1] http://opennlp.sourceforge.net/models-1.5/
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
> >> > >> <ja...@gmail.com> wrote:
> >> > >> > Hi Dileepa,
> >> > >> >
> >> > >> > Thank you so much for your valuble feedback. I'm working on this.
> >> > >> >
> >> > >> >
> >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
> >> > >> dileepajayakody@gmail.com
> >> > >> >> wrote:
> >> > >> >
> >> > >> >> Hi Jayani,
> >> > >> >>
> >> > >> >> There are several enhancement engines in Stanbol developed
> based on
> >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])
> >> >  Each of
> >> > >> >> these engines focus on a particular enhancement aspect using
> >> OpenNLP.
> >> > >> >> Therefore I think it's better to write a new engine for temporal
> >> > >> >> extractions rather than extending the OpenNLP-NER engine.
> >> > >> >>
> >> > >> >> Thanks,
> >> > >> >> Dileepa
> >> > >> >>
> >> > >> >> [1]
> >> > >> >>
> >> > >>
> >> >
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
> >> > >> >>
> >> > >> >>
> >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
> >> > >> >> jayaniwithanawasam@gmail.com> wrote:
> >> > >> >>
> >> > >> >> > Hi,
> >> > >> >> >
> >> > >> >> > I'm researching on adding new enhancement engine for
> extracting
> >> > date
> >> > >> and
> >> > >> >> > time (Temporal extraction) to Stanbol as suggested by Rupert.
> >> > >> >> >
> >> > >> >> > There, it is being found that OpenNLP has an entity extraction
> >> unit
> >> > >> for
> >> > >> >> > date and time.
> >> > >> >> > Also, I noticed that OpenNLP is already integrated to Stanbol
> in
> >> > NER
> >> > >> >> > engine.
> >> > >> >> >
> >> > >> >> > So, as per my understanding, there are two options to extract
> >> date
> >> > and
> >> > >> >> > time.
> >> > >> >> >
> >> > >> >> > One is to have a seperate enhancement engine for date and time
> >> > >> >> information
> >> > >> >> > extraction. Another one is to add date time extraction as a
> code
> >> > >> >> > enhancement to exisitng OpenNLP NER engine.
> >> > >> >> >
> >> > >> >> > What is your opinion on this? Is there any other approach
> which
> >> you
> >> > >> think
> >> > >> >> > that would be better?
> >> > >> >> >
> >> > >> >> > Thank you
> >> > >> >> > Jayani
> >> > >> >> >
> >> > >> >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> > >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> > >> | A-5500 Bischofshofen
> >> > >>
> >> >
> >> >
> >> >
> >> > --
> >> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> > | Bodenlehenstraße 11                             ++43-699-11108907
> >> > | A-5500 Bischofshofen
> >> >
> >>
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Jayani, Antonio,

With "base-line" I mean, that it is IMHO important to have a
functionality also present in the default distribution of Stanbol.
With a Regex based solution this is possible. With implementations
based on GPL licensed projects it is not.

Having a "base-line" implementation would allow to start users with
the Regex based DateExtractionEngine and if this one does not fit the
requirements they would look for alternatives and find advanced
options that would require them do manually download and install
additional GPL licensed software.

best
Rupert


On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales
<ap...@zaizi.com> wrote:
> Hi Jayani
>
> What Rupert means is that it would be good to have a "RegEx" Enhancement
> Engine which extracts/creates TextAnnotations based on regular expressions
> configured in the engine.
> This way you can configure one engine of this type and provide a regular
> expression for extract dates and times.
>
> After that, we can take a look at the projects pointed out by Rupert in
> order to be integrated in Stanbol.
>
> Regards
>
>
> On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam <
> jayaniwithanawasam@gmail.com> wrote:
>
>> Thank you Rupert and Anuj for your suggestions. I'm going through the links
>> you have provided.
>>
>> Rupert,
>>
>> What did you mean by base-line engine that is directly integrated in
>> Stanbol with Regex based approach?
>>
>> Appreciate if you can further elaborate this.
>>
>>
>> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>> > Hi Anuj
>> >
>> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com> wrote:
>> > > I second that. Regex will work better w.r.t. the default trained model
>> of
>> > > OpenNLP.
>> >
>> > Both such projects do look interesting:
>> >
>> > > Also, take a look at this extractor-
>> > https://code.google.com/p/heideltime/ and
>> >
>> > As this is GPLv3 you can not directly use it to implement an
>> > EnhancementEngine that is part of the Stanbol Codebase. Integrating it
>> > via a RESTful service would be an option.
>> >
>> > > Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#!
>> >
>> > The same is true for SuTime as all Stanford NLP components are under GPL.
>> >
>> > If we want to integrate those projects I suggest to extend the Stanbol
>> > RESTful NLP protocol [1] and service [2] so that it can represent
>> > date/time points and ranges. SuTime support could be added to the
>> > already existing Stanbol-Stanford integration [3]. For HeidelTime one
>> > would need to implement a similar component.
>> >
>> >
>> > But before integrating those I would prefer to have a base-line engine
>> > that is directly integrated in Stanbol. Looks like a Regex based
>> > approach could be sufficient for that. WDYT Jayani?
>> >
>> > best
>> > Rupert
>> >
>> > [1] https://issues.apache.org/jira/browse/STANBOL-878
>> > [2] https://issues.apache.org/jira/browse/STANBOL-892
>> > [3] https://github.com/westei/stanbol-stanfordnlp
>> >
>> > >
>> > > It will be useful to have similar temporal expression enhancement
>> engine
>> > in
>> > > Stanbol.
>> > >
>> > > Regards,
>> > > Anuj
>> > >
>> > >
>> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
>> > > rupert.westenthaler@gmail.com> wrote:
>> > >
>> > >> Hi Jayani,
>> > >>
>> > >> I was not even aware that there exists a Time model for OpenNLP.
>> > >> Documentation shows that this uses a purely statistical model so I am
>> > >> wondering about the quality. Note also that OpenNLP only provides a
>> > >> prebuilt model for English [1].
>> > >>
>> > >> AFAIK OpenNLP will only provide you with the information that some
>> > >> tokens do represent a date. It will not provide you the parsed
>> > >> xsd:dateTime. So if you use this Engine you will still need to
>> > >> implement this part of your own. So most likely you will end up using
>> > >> regex patterns to parse the actual time from the Tokens marked by
>> > >> OpenNLP as time.
>> > >>
>> > >> So I am wondering if it is not better to start with Regex from the
>> > >> beginning. If you search for "Regey Date Time extraction" you can
>> > >> fined a huge set of example you could start from.
>> > >>
>> > >> best
>> > >> Rupert
>> > >>
>> > >>
>> > >> [1] http://opennlp.sourceforge.net/models-1.5/
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
>> > >> <ja...@gmail.com> wrote:
>> > >> > Hi Dileepa,
>> > >> >
>> > >> > Thank you so much for your valuble feedback. I'm working on this.
>> > >> >
>> > >> >
>> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
>> > >> dileepajayakody@gmail.com
>> > >> >> wrote:
>> > >> >
>> > >> >> Hi Jayani,
>> > >> >>
>> > >> >> There are several enhancement engines in Stanbol developed based on
>> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])
>> >  Each of
>> > >> >> these engines focus on a particular enhancement aspect using
>> OpenNLP.
>> > >> >> Therefore I think it's better to write a new engine for temporal
>> > >> >> extractions rather than extending the OpenNLP-NER engine.
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Dileepa
>> > >> >>
>> > >> >> [1]
>> > >> >>
>> > >>
>> >
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
>> > >> >>
>> > >> >>
>> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
>> > >> >> jayaniwithanawasam@gmail.com> wrote:
>> > >> >>
>> > >> >> > Hi,
>> > >> >> >
>> > >> >> > I'm researching on adding new enhancement engine for extracting
>> > date
>> > >> and
>> > >> >> > time (Temporal extraction) to Stanbol as suggested by Rupert.
>> > >> >> >
>> > >> >> > There, it is being found that OpenNLP has an entity extraction
>> unit
>> > >> for
>> > >> >> > date and time.
>> > >> >> > Also, I noticed that OpenNLP is already integrated to Stanbol in
>> > NER
>> > >> >> > engine.
>> > >> >> >
>> > >> >> > So, as per my understanding, there are two options to extract
>> date
>> > and
>> > >> >> > time.
>> > >> >> >
>> > >> >> > One is to have a seperate enhancement engine for date and time
>> > >> >> information
>> > >> >> > extraction. Another one is to add date time extraction as a code
>> > >> >> > enhancement to exisitng OpenNLP NER engine.
>> > >> >> >
>> > >> >> > What is your opinion on this? Is there any other approach which
>> you
>> > >> think
>> > >> >> > that would be better?
>> > >> >> >
>> > >> >> > Thank you
>> > >> >> > Jayani
>> > >> >> >
>> > >> >>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> > >> | Bodenlehenstraße 11                             ++43-699-11108907
>> > >> | A-5500 Bischofshofen
>> > >>
>> >
>> >
>> >
>> > --
>> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> > | Bodenlehenstraße 11                             ++43-699-11108907
>> > | A-5500 Bischofshofen
>> >
>>
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Antonio David Perez Morales <ap...@zaizi.com>.
Hi Jayani

What Rupert means is that it would be good to have a "RegEx" Enhancement
Engine which extracts/creates TextAnnotations based on regular expressions
configured in the engine.
This way you can configure one engine of this type and provide a regular
expression for extract dates and times.

After that, we can take a look at the projects pointed out by Rupert in
order to be integrated in Stanbol.

Regards


On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam <
jayaniwithanawasam@gmail.com> wrote:

> Thank you Rupert and Anuj for your suggestions. I'm going through the links
> you have provided.
>
> Rupert,
>
> What did you mean by base-line engine that is directly integrated in
> Stanbol with Regex based approach?
>
> Appreciate if you can further elaborate this.
>
>
> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
> > Hi Anuj
> >
> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com> wrote:
> > > I second that. Regex will work better w.r.t. the default trained model
> of
> > > OpenNLP.
> >
> > Both such projects do look interesting:
> >
> > > Also, take a look at this extractor-
> > https://code.google.com/p/heideltime/ and
> >
> > As this is GPLv3 you can not directly use it to implement an
> > EnhancementEngine that is part of the Stanbol Codebase. Integrating it
> > via a RESTful service would be an option.
> >
> > > Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#!
> >
> > The same is true for SuTime as all Stanford NLP components are under GPL.
> >
> > If we want to integrate those projects I suggest to extend the Stanbol
> > RESTful NLP protocol [1] and service [2] so that it can represent
> > date/time points and ranges. SuTime support could be added to the
> > already existing Stanbol-Stanford integration [3]. For HeidelTime one
> > would need to implement a similar component.
> >
> >
> > But before integrating those I would prefer to have a base-line engine
> > that is directly integrated in Stanbol. Looks like a Regex based
> > approach could be sufficient for that. WDYT Jayani?
> >
> > best
> > Rupert
> >
> > [1] https://issues.apache.org/jira/browse/STANBOL-878
> > [2] https://issues.apache.org/jira/browse/STANBOL-892
> > [3] https://github.com/westei/stanbol-stanfordnlp
> >
> > >
> > > It will be useful to have similar temporal expression enhancement
> engine
> > in
> > > Stanbol.
> > >
> > > Regards,
> > > Anuj
> > >
> > >
> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
> > > rupert.westenthaler@gmail.com> wrote:
> > >
> > >> Hi Jayani,
> > >>
> > >> I was not even aware that there exists a Time model for OpenNLP.
> > >> Documentation shows that this uses a purely statistical model so I am
> > >> wondering about the quality. Note also that OpenNLP only provides a
> > >> prebuilt model for English [1].
> > >>
> > >> AFAIK OpenNLP will only provide you with the information that some
> > >> tokens do represent a date. It will not provide you the parsed
> > >> xsd:dateTime. So if you use this Engine you will still need to
> > >> implement this part of your own. So most likely you will end up using
> > >> regex patterns to parse the actual time from the Tokens marked by
> > >> OpenNLP as time.
> > >>
> > >> So I am wondering if it is not better to start with Regex from the
> > >> beginning. If you search for "Regey Date Time extraction" you can
> > >> fined a huge set of example you could start from.
> > >>
> > >> best
> > >> Rupert
> > >>
> > >>
> > >> [1] http://opennlp.sourceforge.net/models-1.5/
> > >>
> > >>
> > >>
> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
> > >> <ja...@gmail.com> wrote:
> > >> > Hi Dileepa,
> > >> >
> > >> > Thank you so much for your valuble feedback. I'm working on this.
> > >> >
> > >> >
> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
> > >> dileepajayakody@gmail.com
> > >> >> wrote:
> > >> >
> > >> >> Hi Jayani,
> > >> >>
> > >> >> There are several enhancement engines in Stanbol developed based on
> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])
> >  Each of
> > >> >> these engines focus on a particular enhancement aspect using
> OpenNLP.
> > >> >> Therefore I think it's better to write a new engine for temporal
> > >> >> extractions rather than extending the OpenNLP-NER engine.
> > >> >>
> > >> >> Thanks,
> > >> >> Dileepa
> > >> >>
> > >> >> [1]
> > >> >>
> > >>
> >
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
> > >> >>
> > >> >>
> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
> > >> >> jayaniwithanawasam@gmail.com> wrote:
> > >> >>
> > >> >> > Hi,
> > >> >> >
> > >> >> > I'm researching on adding new enhancement engine for extracting
> > date
> > >> and
> > >> >> > time (Temporal extraction) to Stanbol as suggested by Rupert.
> > >> >> >
> > >> >> > There, it is being found that OpenNLP has an entity extraction
> unit
> > >> for
> > >> >> > date and time.
> > >> >> > Also, I noticed that OpenNLP is already integrated to Stanbol in
> > NER
> > >> >> > engine.
> > >> >> >
> > >> >> > So, as per my understanding, there are two options to extract
> date
> > and
> > >> >> > time.
> > >> >> >
> > >> >> > One is to have a seperate enhancement engine for date and time
> > >> >> information
> > >> >> > extraction. Another one is to add date time extraction as a code
> > >> >> > enhancement to exisitng OpenNLP NER engine.
> > >> >> >
> > >> >> > What is your opinion on this? Is there any other approach which
> you
> > >> think
> > >> >> > that would be better?
> > >> >> >
> > >> >> > Thank you
> > >> >> > Jayani
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > >> | Bodenlehenstraße 11                             ++43-699-11108907
> > >> | A-5500 Bischofshofen
> > >>
> >
> >
> >
> > --
> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > | Bodenlehenstraße 11                             ++43-699-11108907
> > | A-5500 Bischofshofen
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Jayani Withanawasam <ja...@gmail.com>.
Thank you Rupert and Anuj for your suggestions. I'm going through the links
you have provided.

Rupert,

What did you mean by base-line engine that is directly integrated in
Stanbol with Regex based approach?

Appreciate if you can further elaborate this.


On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Anuj
>
> On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com> wrote:
> > I second that. Regex will work better w.r.t. the default trained model of
> > OpenNLP.
>
> Both such projects do look interesting:
>
> > Also, take a look at this extractor-
> https://code.google.com/p/heideltime/ and
>
> As this is GPLv3 you can not directly use it to implement an
> EnhancementEngine that is part of the Stanbol Codebase. Integrating it
> via a RESTful service would be an option.
>
> > Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#!
>
> The same is true for SuTime as all Stanford NLP components are under GPL.
>
> If we want to integrate those projects I suggest to extend the Stanbol
> RESTful NLP protocol [1] and service [2] so that it can represent
> date/time points and ranges. SuTime support could be added to the
> already existing Stanbol-Stanford integration [3]. For HeidelTime one
> would need to implement a similar component.
>
>
> But before integrating those I would prefer to have a base-line engine
> that is directly integrated in Stanbol. Looks like a Regex based
> approach could be sufficient for that. WDYT Jayani?
>
> best
> Rupert
>
> [1] https://issues.apache.org/jira/browse/STANBOL-878
> [2] https://issues.apache.org/jira/browse/STANBOL-892
> [3] https://github.com/westei/stanbol-stanfordnlp
>
> >
> > It will be useful to have similar temporal expression enhancement engine
> in
> > Stanbol.
> >
> > Regards,
> > Anuj
> >
> >
> > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi Jayani,
> >>
> >> I was not even aware that there exists a Time model for OpenNLP.
> >> Documentation shows that this uses a purely statistical model so I am
> >> wondering about the quality. Note also that OpenNLP only provides a
> >> prebuilt model for English [1].
> >>
> >> AFAIK OpenNLP will only provide you with the information that some
> >> tokens do represent a date. It will not provide you the parsed
> >> xsd:dateTime. So if you use this Engine you will still need to
> >> implement this part of your own. So most likely you will end up using
> >> regex patterns to parse the actual time from the Tokens marked by
> >> OpenNLP as time.
> >>
> >> So I am wondering if it is not better to start with Regex from the
> >> beginning. If you search for "Regey Date Time extraction" you can
> >> fined a huge set of example you could start from.
> >>
> >> best
> >> Rupert
> >>
> >>
> >> [1] http://opennlp.sourceforge.net/models-1.5/
> >>
> >>
> >>
> >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
> >> <ja...@gmail.com> wrote:
> >> > Hi Dileepa,
> >> >
> >> > Thank you so much for your valuble feedback. I'm working on this.
> >> >
> >> >
> >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
> >> dileepajayakody@gmail.com
> >> >> wrote:
> >> >
> >> >> Hi Jayani,
> >> >>
> >> >> There are several enhancement engines in Stanbol developed based on
> >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])
>  Each of
> >> >> these engines focus on a particular enhancement aspect using OpenNLP.
> >> >> Therefore I think it's better to write a new engine for temporal
> >> >> extractions rather than extending the OpenNLP-NER engine.
> >> >>
> >> >> Thanks,
> >> >> Dileepa
> >> >>
> >> >> [1]
> >> >>
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
> >> >>
> >> >>
> >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
> >> >> jayaniwithanawasam@gmail.com> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I'm researching on adding new enhancement engine for extracting
> date
> >> and
> >> >> > time (Temporal extraction) to Stanbol as suggested by Rupert.
> >> >> >
> >> >> > There, it is being found that OpenNLP has an entity extraction unit
> >> for
> >> >> > date and time.
> >> >> > Also, I noticed that OpenNLP is already integrated to Stanbol in
> NER
> >> >> > engine.
> >> >> >
> >> >> > So, as per my understanding, there are two options to extract date
> and
> >> >> > time.
> >> >> >
> >> >> > One is to have a seperate enhancement engine for date and time
> >> >> information
> >> >> > extraction. Another one is to add date time extraction as a code
> >> >> > enhancement to exisitng OpenNLP NER engine.
> >> >> >
> >> >> > What is your opinion on this? Is there any other approach which you
> >> think
> >> >> > that would be better?
> >> >> >
> >> >> > Thank you
> >> >> > Jayani
> >> >> >
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Anuj

On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <an...@gmail.com> wrote:
> I second that. Regex will work better w.r.t. the default trained model of
> OpenNLP.

Both such projects do look interesting:

> Also, take a look at this extractor- https://code.google.com/p/heideltime/ and

As this is GPLv3 you can not directly use it to implement an
EnhancementEngine that is part of the Stanbol Codebase. Integrating it
via a RESTful service would be an option.

> Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#!

The same is true for SuTime as all Stanford NLP components are under GPL.

If we want to integrate those projects I suggest to extend the Stanbol
RESTful NLP protocol [1] and service [2] so that it can represent
date/time points and ranges. SuTime support could be added to the
already existing Stanbol-Stanford integration [3]. For HeidelTime one
would need to implement a similar component.


But before integrating those I would prefer to have a base-line engine
that is directly integrated in Stanbol. Looks like a Regex based
approach could be sufficient for that. WDYT Jayani?

best
Rupert

[1] https://issues.apache.org/jira/browse/STANBOL-878
[2] https://issues.apache.org/jira/browse/STANBOL-892
[3] https://github.com/westei/stanbol-stanfordnlp

>
> It will be useful to have similar temporal expression enhancement engine in
> Stanbol.
>
> Regards,
> Anuj
>
>
> On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Jayani,
>>
>> I was not even aware that there exists a Time model for OpenNLP.
>> Documentation shows that this uses a purely statistical model so I am
>> wondering about the quality. Note also that OpenNLP only provides a
>> prebuilt model for English [1].
>>
>> AFAIK OpenNLP will only provide you with the information that some
>> tokens do represent a date. It will not provide you the parsed
>> xsd:dateTime. So if you use this Engine you will still need to
>> implement this part of your own. So most likely you will end up using
>> regex patterns to parse the actual time from the Tokens marked by
>> OpenNLP as time.
>>
>> So I am wondering if it is not better to start with Regex from the
>> beginning. If you search for "Regey Date Time extraction" you can
>> fined a huge set of example you could start from.
>>
>> best
>> Rupert
>>
>>
>> [1] http://opennlp.sourceforge.net/models-1.5/
>>
>>
>>
>> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
>> <ja...@gmail.com> wrote:
>> > Hi Dileepa,
>> >
>> > Thank you so much for your valuble feedback. I'm working on this.
>> >
>> >
>> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
>> dileepajayakody@gmail.com
>> >> wrote:
>> >
>> >> Hi Jayani,
>> >>
>> >> There are several enhancement engines in Stanbol developed based on
>> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])  Each of
>> >> these engines focus on a particular enhancement aspect using OpenNLP.
>> >> Therefore I think it's better to write a new engine for temporal
>> >> extractions rather than extending the OpenNLP-NER engine.
>> >>
>> >> Thanks,
>> >> Dileepa
>> >>
>> >> [1]
>> >>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
>> >>
>> >>
>> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
>> >> jayaniwithanawasam@gmail.com> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I'm researching on adding new enhancement engine for extracting date
>> and
>> >> > time (Temporal extraction) to Stanbol as suggested by Rupert.
>> >> >
>> >> > There, it is being found that OpenNLP has an entity extraction unit
>> for
>> >> > date and time.
>> >> > Also, I noticed that OpenNLP is already integrated to Stanbol in NER
>> >> > engine.
>> >> >
>> >> > So, as per my understanding, there are two options to extract date and
>> >> > time.
>> >> >
>> >> > One is to have a seperate enhancement engine for date and time
>> >> information
>> >> > extraction. Another one is to add date time extraction as a code
>> >> > enhancement to exisitng OpenNLP NER engine.
>> >> >
>> >> > What is your opinion on this? Is there any other approach which you
>> think
>> >> > that would be better?
>> >> >
>> >> > Thank you
>> >> > Jayani
>> >> >
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Anuj Kumar <an...@gmail.com>.
I second that. Regex will work better w.r.t. the default trained model of
OpenNLP.
Also, take a look at this extractor- https://code.google.com/p/heideltime/ and
Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#!

It will be useful to have similar temporal expression enhancement engine in
Stanbol.

Regards,
Anuj


On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Jayani,
>
> I was not even aware that there exists a Time model for OpenNLP.
> Documentation shows that this uses a purely statistical model so I am
> wondering about the quality. Note also that OpenNLP only provides a
> prebuilt model for English [1].
>
> AFAIK OpenNLP will only provide you with the information that some
> tokens do represent a date. It will not provide you the parsed
> xsd:dateTime. So if you use this Engine you will still need to
> implement this part of your own. So most likely you will end up using
> regex patterns to parse the actual time from the Tokens marked by
> OpenNLP as time.
>
> So I am wondering if it is not better to start with Regex from the
> beginning. If you search for "Regey Date Time extraction" you can
> fined a huge set of example you could start from.
>
> best
> Rupert
>
>
> [1] http://opennlp.sourceforge.net/models-1.5/
>
>
>
> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
> <ja...@gmail.com> wrote:
> > Hi Dileepa,
> >
> > Thank you so much for your valuble feedback. I'm working on this.
> >
> >
> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
> dileepajayakody@gmail.com
> >> wrote:
> >
> >> Hi Jayani,
> >>
> >> There are several enhancement engines in Stanbol developed based on
> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])  Each of
> >> these engines focus on a particular enhancement aspect using OpenNLP.
> >> Therefore I think it's better to write a new engine for temporal
> >> extractions rather than extending the OpenNLP-NER engine.
> >>
> >> Thanks,
> >> Dileepa
> >>
> >> [1]
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
> >>
> >>
> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
> >> jayaniwithanawasam@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I'm researching on adding new enhancement engine for extracting date
> and
> >> > time (Temporal extraction) to Stanbol as suggested by Rupert.
> >> >
> >> > There, it is being found that OpenNLP has an entity extraction unit
> for
> >> > date and time.
> >> > Also, I noticed that OpenNLP is already integrated to Stanbol in NER
> >> > engine.
> >> >
> >> > So, as per my understanding, there are two options to extract date and
> >> > time.
> >> >
> >> > One is to have a seperate enhancement engine for date and time
> >> information
> >> > extraction. Another one is to add date time extraction as a code
> >> > enhancement to exisitng OpenNLP NER engine.
> >> >
> >> > What is your opinion on this? Is there any other approach which you
> think
> >> > that would be better?
> >> >
> >> > Thank you
> >> > Jayani
> >> >
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Jayani,

I was not even aware that there exists a Time model for OpenNLP.
Documentation shows that this uses a purely statistical model so I am
wondering about the quality. Note also that OpenNLP only provides a
prebuilt model for English [1].

AFAIK OpenNLP will only provide you with the information that some
tokens do represent a date. It will not provide you the parsed
xsd:dateTime. So if you use this Engine you will still need to
implement this part of your own. So most likely you will end up using
regex patterns to parse the actual time from the Tokens marked by
OpenNLP as time.

So I am wondering if it is not better to start with Regex from the
beginning. If you search for "Regey Date Time extraction" you can
fined a huge set of example you could start from.

best
Rupert


[1] http://opennlp.sourceforge.net/models-1.5/



On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
<ja...@gmail.com> wrote:
> Hi Dileepa,
>
> Thank you so much for your valuble feedback. I'm working on this.
>
>
> On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <dileepajayakody@gmail.com
>> wrote:
>
>> Hi Jayani,
>>
>> There are several enhancement engines in Stanbol developed based on
>> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])  Each of
>> these engines focus on a particular enhancement aspect using OpenNLP.
>> Therefore I think it's better to write a new engine for temporal
>> extractions rather than extending the OpenNLP-NER engine.
>>
>> Thanks,
>> Dileepa
>>
>> [1]
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
>>
>>
>> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
>> jayaniwithanawasam@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I'm researching on adding new enhancement engine for extracting date and
>> > time (Temporal extraction) to Stanbol as suggested by Rupert.
>> >
>> > There, it is being found that OpenNLP has an entity extraction unit for
>> > date and time.
>> > Also, I noticed that OpenNLP is already integrated to Stanbol in NER
>> > engine.
>> >
>> > So, as per my understanding, there are two options to extract date and
>> > time.
>> >
>> > One is to have a seperate enhancement engine for date and time
>> information
>> > extraction. Another one is to add date time extraction as a code
>> > enhancement to exisitng OpenNLP NER engine.
>> >
>> > What is your opinion on this? Is there any other approach which you think
>> > that would be better?
>> >
>> > Thank you
>> > Jayani
>> >
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Jayani Withanawasam <ja...@gmail.com>.
Hi Dileepa,

Thank you so much for your valuble feedback. I'm working on this.


On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <dileepajayakody@gmail.com
> wrote:

> Hi Jayani,
>
> There are several enhancement engines in Stanbol developed based on
> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])  Each of
> these engines focus on a particular enhancement aspect using OpenNLP.
> Therefore I think it's better to write a new engine for temporal
> extractions rather than extending the OpenNLP-NER engine.
>
> Thanks,
> Dileepa
>
> [1]
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
>
>
> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
> jayaniwithanawasam@gmail.com> wrote:
>
> > Hi,
> >
> > I'm researching on adding new enhancement engine for extracting date and
> > time (Temporal extraction) to Stanbol as suggested by Rupert.
> >
> > There, it is being found that OpenNLP has an entity extraction unit for
> > date and time.
> > Also, I noticed that OpenNLP is already integrated to Stanbol in NER
> > engine.
> >
> > So, as per my understanding, there are two options to extract date and
> > time.
> >
> > One is to have a seperate enhancement engine for date and time
> information
> > extraction. Another one is to add date time extraction as a code
> > enhancement to exisitng OpenNLP NER engine.
> >
> > What is your opinion on this? Is there any other approach which you think
> > that would be better?
> >
> > Thank you
> > Jayani
> >
>

Re: STANBOL-1209: Temporal expression extraction engine for Stanbol

Posted by Dileepa Jayakody <di...@gmail.com>.
Hi Jayani,

There are several enhancement engines in Stanbol developed based on
OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1])  Each of
these engines focus on a particular enhancement aspect using OpenNLP.
Therefore I think it's better to write a new engine for temporal
extractions rather than extending the OpenNLP-NER engine.

Thanks,
Dileepa

[1]
https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp


On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
jayaniwithanawasam@gmail.com> wrote:

> Hi,
>
> I'm researching on adding new enhancement engine for extracting date and
> time (Temporal extraction) to Stanbol as suggested by Rupert.
>
> There, it is being found that OpenNLP has an entity extraction unit for
> date and time.
> Also, I noticed that OpenNLP is already integrated to Stanbol in NER
> engine.
>
> So, as per my understanding, there are two options to extract date and
> time.
>
> One is to have a seperate enhancement engine for date and time information
> extraction. Another one is to add date time extraction as a code
> enhancement to exisitng OpenNLP NER engine.
>
> What is your opinion on this? Is there any other approach which you think
> that would be better?
>
> Thank you
> Jayani
>