You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Baker James D <JD...@mail.dstl.gov.uk> on 2015/10/05 13:14:23 UTC

RE: [UK OFFICIAL] Baleen - UIMA Based Text Analytics Framework

Classification: UK OFFICIAL

Afternoon everyone,

In response to Petr's comments, we have added some additional information to the Wiki section of the Baleen GitHub repo. We haven't added any new information (yet), but we have collated information that is already available into one place to make it more accessible. If there are any specific areas that people feel could do with more attention, please let us know and we'll see what we can do.

https://github.com/dstl/baleen/wiki

Thanks,
James


-----Original Message-----
From: Petr Baudis [mailto:pasky@ucw.cz]
Sent: 28 September 2015 21:23
To: Baker James D
Cc: user@uima.apache.org
Subject: Re: [UK OFFICIAL] Baleen - UIMA Based Text Analytics Framework

  Hi!

On Mon, Sep 28, 2015 at 02:31:03PM +0100, Baker James D wrote:
> I would like to draw your attention to a text analytics framework that has just been released by Dstl (part of the UK Ministry of Defence). It uses UIMA as part of its underlying architecture but provides additional functionality on top of that, and simplifies much of the user configuration and experience, as well as the development process. A number of collection readers, annotators and consumers are included as part of the framework.
> 
> The tool is called Baleen, and is released under Apache Software License 2.
> 
> There is more information about the tool on the press release
> (https://www.gov.uk/government/news/dstl-adds-to-open-source-software) 
> and on the GitHub page
> (https://github.com/dstl/baleen)
  Thanks for the heads up.  However, I haven't found any clear summary of what is the framework capable of right now - I think you might want to expand the generic description a bit with some examples and use-cases.  I have been looking around a bit and seems like e.g.

	https://github.com/dstl/baleen/blob/master/baleen/baleen-annotators/src/main/java/uk/gov/dstl/baleen/annotators/cleaners/MergeAdjacentQuantities.java

is something that could be pretty useful, but you might want to make it easier to discover the capabilities to get more users / contributors.

  Best,

				Petr Baudis

"This e-mail and any attachment(s) is intended for the recipient only.   Its unauthorised use, 
disclosure, storage or copying is not permitted.  Communications with Dstl are monitored and/or 
recorded for system efficiency and other lawful purposes, including business intelligence, business 
metrics and training.  Any views or opinions expressed in this e-mail do not necessarily reflect Dstl policy."

"If you are not the intended recipient, please remove it from your system and notify the author of 
the email and centralenq@dstl.gov.uk"

RE: [UK OFFICIAL] Baleen - UIMA Based Text Analytics Framework

Posted by Baker James D <JD...@mail.dstl.gov.uk>.
Classification: UK OFFICIAL

Hi Jens,

I haven't tried, but I suspect it wouldn't be as straight forward as taking a Baleen component and using it with another UIMA based system. Whilst Baleen is UIMA based, we did have to augment UIMA with a lot of additional functionality to get it to do what we wanted. As that additional functionality doesn't exist in those other pipelines (or, at least, not in the same form) it's unlikely that the components will work without modification. 

It's more likely though that components could go the other way (e.g. a vanilla UIMA component working in Baleen), although again we haven't tried that.

James

-----Original Message-----
From: jens@grivolla.net [mailto:jens@grivolla.net] On Behalf Of Jens Grivolla
Sent: 05 October 2015 15:13
To: user@uima.apache.org
Subject: Re: [UK OFFICIAL] Baleen - UIMA Based Text Analytics Framework

Hi James, this looks interesting and there seem to be quite a few components available.

How interoperable is it with e.g. DKPro (or other UIMA components), i.e.
could I just take AEs from Baleen and use them within a DKPro pipeline?

Thanks,
Jens

On Mon, Oct 5, 2015 at 1:14 PM, Baker James D <JD...@mail.dstl.gov.uk>
wrote:

> Classification: UK OFFICIAL
>
> Afternoon everyone,
>
> In response to Petr's comments, we have added some additional 
> information to the Wiki section of the Baleen GitHub repo. We haven't 
> added any new information (yet), but we have collated information that 
> is already available into one place to make it more accessible. If 
> there are any specific areas that people feel could do with more 
> attention, please let us know and we'll see what we can do.
>
> http://scanmail.trustwave.com/?c=7240&d=qZCS1uyk5gD9tOFh0wP7Fe5NVCDXd7
> SVw3smRPqkFw&u=https%3a%2f%2fgithub%2ecom%2fdstl%2fbaleen%2fwiki
>
> Thanks,
> James
>
>
> -----Original Message-----
> From: Petr Baudis [mailto:pasky@ucw.cz]
> Sent: 28 September 2015 21:23
> To: Baker James D
> Cc: user@uima.apache.org
> Subject: Re: [UK OFFICIAL] Baleen - UIMA Based Text Analytics 
> Framework
>
>   Hi!
>
> On Mon, Sep 28, 2015 at 02:31:03PM +0100, Baker James D wrote:
> > I would like to draw your attention to a text analytics framework 
> > that
> has just been released by Dstl (part of the UK Ministry of Defence). 
> It uses UIMA as part of its underlying architecture but provides 
> additional functionality on top of that, and simplifies much of the 
> user configuration and experience, as well as the development process.
> A number of collection readers, annotators and consumers are included as part of the framework.
> >
> > The tool is called Baleen, and is released under Apache Software 
> > License
> 2.
> >
> > There is more information about the tool on the press release 
> > (http://scanmail.trustwave.com/?c=7240&d=qZCS1uyk5gD9tOFh0wP7Fe5NVCD
> > Xd7SVwyApRKj2Gw&u=https%3a%2f%2fwww%2egov%2euk%2fgovernment%2fnews%2
> > fdstl-adds-to-open-source-software%29
> > and on the GitHub page
> > (http://scanmail.trustwave.com/?c=7240&d=qZCS1uyk5gD9tOFh0wP7Fe5NVCD
> > Xd7SVwyEsE_3xQQ&u=https%3a%2f%2fgithub%2ecom%2fdstl%2fbaleen%29
>   Thanks for the heads up.  However, I haven't found any clear summary 
> of what is the framework capable of right now - I think you might want 
> to expand the generic description a bit with some examples and 
> use-cases.  I have been looking around a bit and seems like e.g.
>
>
> http://scanmail.trustwave.com/?c=7240&d=qZCS1uyk5gD9tOFh0wP7Fe5NVCDXd7
> SVwykuQKuhGg&u=https%3a%2f%2fgithub%2ecom%2fdstl%2fbaleen%2fblob%2fmas
> ter%2fbaleen%2fbaleen-annotators%2fsrc%2fmain%2fjava%2fuk%2fgov%2fdstl
> %2fbaleen%2fannotators%2fcleaners%2fMergeAdjacentQuantities%2ejava
>
> is something that could be pretty useful, but you might want to make 
> it easier to discover the capabilities to get more users / contributors.
>
>   Best,
>
>                                 Petr Baudis
>
> "This e-mail and any attachment(s) is intended for the recipient only.
>  Its unauthorised use,
> disclosure, storage or copying is not permitted.  Communications with 
> Dstl are monitored and/or recorded for system efficiency and other 
> lawful purposes, including business intelligence, business metrics and 
> training.  Any views or opinions expressed in this e-mail do not 
> necessarily reflect Dstl policy."
>
> "If you are not the intended recipient, please remove it from your 
> system and notify the author of the email and centralenq@dstl.gov.uk"
>

"This e-mail and any attachment(s) is intended for the recipient only.   Its unauthorised use, 
disclosure, storage or copying is not permitted.  Communications with Dstl are monitored and/or 
recorded for system efficiency and other lawful purposes, including business intelligence, business 
metrics and training.  Any views or opinions expressed in this e-mail do not necessarily reflect Dstl policy."

"If you are not the intended recipient, please remove it from your system and notify the author of 
the email and centralenq@dstl.gov.uk"

Re: [UK OFFICIAL] Baleen - UIMA Based Text Analytics Framework

Posted by Jens Grivolla <j+...@grivolla.net>.
Hi James, this looks interesting and there seem to be quite a few
components available.

How interoperable is it with e.g. DKPro (or other UIMA components), i.e.
could I just take AEs from Baleen and use them within a DKPro pipeline?

Thanks,
Jens

On Mon, Oct 5, 2015 at 1:14 PM, Baker James D <JD...@mail.dstl.gov.uk>
wrote:

> Classification: UK OFFICIAL
>
> Afternoon everyone,
>
> In response to Petr's comments, we have added some additional information
> to the Wiki section of the Baleen GitHub repo. We haven't added any new
> information (yet), but we have collated information that is already
> available into one place to make it more accessible. If there are any
> specific areas that people feel could do with more attention, please let us
> know and we'll see what we can do.
>
> https://github.com/dstl/baleen/wiki
>
> Thanks,
> James
>
>
> -----Original Message-----
> From: Petr Baudis [mailto:pasky@ucw.cz]
> Sent: 28 September 2015 21:23
> To: Baker James D
> Cc: user@uima.apache.org
> Subject: Re: [UK OFFICIAL] Baleen - UIMA Based Text Analytics Framework
>
>   Hi!
>
> On Mon, Sep 28, 2015 at 02:31:03PM +0100, Baker James D wrote:
> > I would like to draw your attention to a text analytics framework that
> has just been released by Dstl (part of the UK Ministry of Defence). It
> uses UIMA as part of its underlying architecture but provides additional
> functionality on top of that, and simplifies much of the user configuration
> and experience, as well as the development process. A number of collection
> readers, annotators and consumers are included as part of the framework.
> >
> > The tool is called Baleen, and is released under Apache Software License
> 2.
> >
> > There is more information about the tool on the press release
> > (https://www.gov.uk/government/news/dstl-adds-to-open-source-software)
> > and on the GitHub page
> > (https://github.com/dstl/baleen)
>   Thanks for the heads up.  However, I haven't found any clear summary of
> what is the framework capable of right now - I think you might want to
> expand the generic description a bit with some examples and use-cases.  I
> have been looking around a bit and seems like e.g.
>
>
> https://github.com/dstl/baleen/blob/master/baleen/baleen-annotators/src/main/java/uk/gov/dstl/baleen/annotators/cleaners/MergeAdjacentQuantities.java
>
> is something that could be pretty useful, but you might want to make it
> easier to discover the capabilities to get more users / contributors.
>
>   Best,
>
>                                 Petr Baudis
>
> "This e-mail and any attachment(s) is intended for the recipient only.
>  Its unauthorised use,
> disclosure, storage or copying is not permitted.  Communications with Dstl
> are monitored and/or
> recorded for system efficiency and other lawful purposes, including
> business intelligence, business
> metrics and training.  Any views or opinions expressed in this e-mail do
> not necessarily reflect Dstl policy."
>
> "If you are not the intended recipient, please remove it from your system
> and notify the author of
> the email and centralenq@dstl.gov.uk"
>