You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Philippe de Rochambeau <ph...@free.fr> on 2014/01/16 08:42:59 UTC

Alternatives to GATE?

Hello,

can anyone suggest alternatives to GATE (http://gate.ac.uk/download/)? I would like to index place and person names in PDFs using gazetteers (ie, dictionaries) and normalize dates ( (eg, December 1st, 2001 will be indexed as 20011201) and feed the indexes to SOLR?

GATE is a great tool, but the search engine, Mimir, is unfortunately not customizable (well-document enough) enough for my purposes, which are to return the found documents (PDFs) ordered by document or entity (eg, {Date}, {Person}) name.

Many thanks.

Philippe



Re: Alternatives to GATE?

Posted by parnab kumar <pa...@gmail.com>.
Hi,

You can have a look at OpenNLP.
http://opennlp.apache.org/



Thanks,
Parnab


On Thu, Jan 16, 2014 at 1:12 PM, Philippe de Rochambeau <ph...@free.fr>wrote:

> Hello,
>
> can anyone suggest alternatives to GATE (http://gate.ac.uk/download/)? I
> would like to index place and person names in PDFs using gazetteers (ie,
> dictionaries) and normalize dates ( (eg, December 1st, 2001 will be indexed
> as 20011201) and feed the indexes to SOLR?
>
> GATE is a great tool, but the search engine, Mimir, is unfortunately not
> customizable (well-document enough) enough for my purposes, which are to
> return the found documents (PDFs) ordered by document or entity (eg,
> {Date}, {Person}) name.
>
> Many thanks.
>
> Philippe
>
>
>

Re: Alternatives to GATE?

Posted by Tommaso Teofili <to...@gmail.com>.
If you need a framework to build your enhancement pipeline on I think
Apache UIMA [1] is good as it's also able to store annotated documents into
Lucene and Solr so it may be a good fit for your needs. Just consider that
you have to learn how to use / develop on top of it, it's not a big deal
but needs it to be taken into account (especially because you're running
away from GATE).

My 2 cents,
Tommaso


2014/1/16 Philippe de Rochambeau <ph...@free.fr>

> Hello,
>
> can anyone suggest alternatives to GATE (http://gate.ac.uk/download/)? I
> would like to index place and person names in PDFs using gazetteers (ie,
> dictionaries) and normalize dates ( (eg, December 1st, 2001 will be indexed
> as 20011201) and feed the indexes to SOLR?
>
> GATE is a great tool, but the search engine, Mimir, is unfortunately not
> customizable (well-document enough) enough for my purposes, which are to
> return the found documents (PDFs) ordered by document or entity (eg,
> {Date}, {Person}) name.
>
> Many thanks.
>
> Philippe
>
>
>

Re: Alternatives to GATE?

Posted by Charlie Hull <ch...@flax.co.uk>.
On 16/01/2014 07:42, Philippe de Rochambeau wrote:
> Hello,
>
> can anyone suggest alternatives to GATE
> (http://gate.ac.uk/download/)? I would like to index place and person
> names in PDFs using gazetteers (ie, dictionaries) and normalize dates
> ( (eg, December 1st, 2001 will be indexed as 20011201) and feed the
> indexes to SOLR?
>
> GATE is a great tool, but the search engine, Mimir, is unfortunately
> not customizable (well-document enough) enough for my purposes, which
> are to return the found documents (PDFs) ordered by document or
> entity (eg, {Date}, {Person}) name.
>
> Many thanks.
>
> Philippe
>
>
Hi Phillippe,

For entity extraction we often use the Stanford NLP libraries which are
part of GATE but a lot simpler (GATE is a bit of a beast TBH): for
example in a taxonomy editor/classifier prototype we built recently we 
use Stanford to pull out entities from classified documents as 
suggestions for improving a node definition:
http://www.flax.co.uk/blog/2012/06/12/clade-a-freely-available-open-source-taxonomy-and-autoclassification-tool/

There's also an interesting European Commission funded project in this 
area, a 'marketplace' for text classification & extraction apps:
https://annomarket.com/

HTH

Cheers

Charlie

-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk