You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Aman Tandon <am...@gmail.com> on 2014/09/10 12:16:21 UTC

Re: Integrate solr with openNLP

Hi,

What is the progress of integration of nlp with solr. If you have achieved
this integration techniques successfully then please share with us.

With Regards
Aman Tandon

On Tue, Jun 10, 2014 at 11:04 AM, Vivekanand Ittigi <vi...@biginfolabs.com>
wrote:

> Hi Aman,
>
> Yeah, We are also thinking the same. Using UIMA is better. And thanks to
> everyone. You guys really showed us the way(UIMA).
>
> We'll work on it.
>
> Thanks,
> Vivek
>
>
> On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon <am...@gmail.com>
> wrote:
>
> > Hi Vikek,
> >
> > As everybody in the mail list mentioned to use UIMA you should go for it,
> > as opennlp issues are not tracking properly, it can make stuck your
> > development in near future if any issue comes, so its better to start
> > investigate with uima.
> >
> >
> > With Regards
> > Aman Tandon
> >
> >
> > On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi <
> vivek@biginfolabs.com>
> > wrote:
> >
> > > Can anyone pleas reply..?
> > >
> > > Thanks,
> > > Vivek
> > >
> > > ---------- Forwarded message ----------
> > > From: Vivekanand Ittigi <vi...@biginfolabs.com>
> > > Date: Wed, Jun 4, 2014 at 4:38 PM
> > > Subject: Re: Integrate solr with openNLP
> > > To: Tommaso Teofili <to...@gmail.com>
> > > Cc: "solr-user@lucene.apache.org" <so...@lucene.apache.org>, Ahmet
> > > Arslan <io...@yahoo.com>
> > >
> > >
> > > Hi Tommaso,
> > >
> > > Yes, you are right. 4.4 version will work.. I'm able to compile now.
> I'm
> > > trying to apply named recognition(person name) token but im not seeing
> > any
> > > change. my schema.xml looks like this:
> > >
> > > <field name="text" type="text_opennlp_pos_ner" indexed="true"
> > stored="true"
> > > multiValued="true"/>
> > >
> > > <fieldType name="text_opennlp_pos_ner" class="solr.TextField"
> > > positionIncrementGap="100">
> > >       <analyzer>
> > >         <tokenizer class="solr.OpenNLPTokenizerFactory"
> > >           tokenizerModel="opennlp/en-token.bin"
> > >         />
> > >         <filter class="solr.OpenNLPFilterFactory"
> > >           nerTaggerModels="opennlp/en-ner-person.bin"
> > >         />
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >       </analyzer>
> > >
> > >     </fieldType>
> > >
> > > Please guide..?
> > >
> > > Thanks,
> > > Vivek
> > >
> > >
> > > On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili <
> > tommaso.teofili@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Ahment was suggesting to eventually use UIMA integration because
> > OpenNLP
> > > > has already an integration with Apache UIMA and so you would just
> have
> > to
> > > > use that [1].
> > > > And that's one of the main reason UIMA integration was done: it's a
> > > > framework that you can easily hook into in order to plug your NLP
> > > algorithm.
> > > >
> > > > If you want to just use OpenNLP then it's up to you if either write
> > your
> > > > own UpdateRequestProcessor plugin [2] to add metadata extracted by
> > > OpenNLP
> > > > to your documents or either you can write a dedicated analyzer /
> > > tokenizer
> > > > / token filter.
> > > >
> > > > For the OpenNLP integration (LUCENE-2899), the patch is not up to
> date
> > > > with the latest APIs in trunk, however you should be able to apply it
> > to
> > > > (if I recall correctly) to 4.4 version or so, and also adapting it to
> > the
> > > > latest API shouldn't be too hard.
> > > >
> > > > Regards,
> > > > Tommaso
> > > >
> > > > [1] :
> > > >
> > >
> >
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
> > > > [2] : http://wiki.apache.org/solr/UpdateRequestProcessor
> > > >
> > > >
> > > >
> > > > 2014-06-03 15:34 GMT+02:00 Ahmet Arslan <io...@yahoo.com.invalid>:
> > > >
> > > > Can you extract names, locations etc using OpenNLP in plain/straight
> > java
> > > >> program?
> > > >>
> > > >> If yes, here are two seperate options :
> > > >>
> > > >> 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
> > > >> example to integrate your NER code into it and write your own
> indexing
> > > >> code. You have the full power here. No solr-plugins are involved.
> > > >>
> > > >> 2) Use 'Implementing a conditional copyField' given here :
> > > >> http://wiki.apache.org/solr/UpdateRequestProcessor
> > > >> as an example and integrate your NER code into it.
> > > >>
> > > >>
> > > >> Please note that these are separate ways to enrich your incoming
> > > >> documents, choose either (1) or (2).
> > > >>
> > > >>
> > > >>
> > > >> On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi <
> > > >> vivek@biginfolabs.com> wrote:
> > > >> Okay, but i dint understand what you said. Can you please elaborate.
> > > >>
> > > >> Thanks,
> > > >> Vivek
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan <io...@yahoo.com>
> > wrote:
> > > >>
> > > >> > Hi Vivekanand,
> > > >> >
> > > >> > I have never use UIMA+Solr before.
> > > >> >
> > > >> > Personally I think it takes more time to learn how to
> configure/use
> > > >> these
> > > >> > uima stuff.
> > > >> >
> > > >> >
> > > >> > If you are familiar with java, write a class that extends
> > > >> > UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these
> new
> > > >> fields
> > > >> > (organisation, city, person name, etc, to your document. This
> phase
> > is
> > > >> > usually called 'enrichment'.
> > > >> >
> > > >> > Does that makes sense?
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi <
> > > >> vivek@biginfolabs.com>
> > > >> > wrote:
> > > >> > Hi Ahmet,
> > > >> >
> > > >> > I followed what you said
> > > >> > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> .
> > > But
> > > >> how
> > > >> > can i achieve my goal? i mean extracting only name of the
> > organization
> > > >> or
> > > >> > person from the content field.
> > > >> >
> > > >> > I guess i'm almost there but something is missing? please guide me
> > > >> >
> > > >> > Thanks,
> > > >> > Vivek
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi <
> > > >> vivek@biginfolabs.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Entire goal cant be said but one of those tasks can be like
> this..
> > > we
> > > >> > have
> > > >> > > big document(can be website or pdf etc) indexed to the solr.
> > > >> > > Lets say <field name=content> will sore store the contents of
> > > >> document.
> > > >> > > All i want to do is pick name of persons,places from it using
> > > openNLP
> > > >> or
> > > >> > > some other means.
> > > >> > >
> > > >> > > Those names should be reflected in solr itself.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Vivek
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan <iorixxx@yahoo.com
> >
> > > >> wrote:
> > > >> > >
> > > >> > >> Hi,
> > > >> > >>
> > > >> > >> Please tell us what you are trying to in a new treat. Your high
> > > level
> > > >> > >> goal. There may be some other ways/tools such as (
> > > >> > >> https://stanbol.apache.org ) other than OpenNLP.
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi <
> > > >> > >> vivek@biginfolabs.com> wrote:
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> We'll surely look into UIMA integration.
> > > >> > >>
> > > >> > >> But before moving, is this(
> https://wiki.apache.org/solr/OpenNLP
> > )
> > > >> the
> > > >> > >> only link we've got to integrate?isn't there any other article
> or
> > > >> link
> > > >> > >> which may help us to do fix this problem.
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >> Vivek
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan <
> iorixxx@yahoo.com>
> > > >> wrote:
> > > >> > >>
> > > >> > >> Hi,
> > > >> > >> >
> > > >> > >> >I believe I answered it. Let me re-try,
> > > >> > >> >
> > > >> > >> >There is no committed code for OpenNLP. There is an open
> ticket
> > > with
> > > >> > >> patches. They may not work with current trunk.
> > > >> > >> >
> > > >> > >> >Confluence is the official documentation. Wiki is maintained
> by
> > > >> > >> community. Meaning wiki can talk about some uncommitted
> > > >> features/stuff.
> > > >> > >> Like this one : https://wiki.apache.org/solr/OpenNLP
> > > >> > >> >
> > > >> > >> >What I am suggesting is, have a look at
> > > >> > >>
> > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >And search how to use OpenNLP inside UIMA. May be LUCENE-2899
> is
> > > >> > already
> > > >> > >> doable with solr-uima. I am adding Tommaso (sorry for this but
> we
> > > >> need
> > > >> > an
> > > >> > >> authoritative answer here) to clarify this.
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >Also consider indexing with SolrJ and use OpenNLP enrichment
> > > outside
> > > >> > the
> > > >> > >> solr. Use openNLP with plain java, enrich your documents and
> > index
> > > >> them
> > > >> > >> with SolJ. You don't have to too everything inside solr as
> > > >> solr-plugins.
> > > >> > >> >
> > > >> > >> >Hope this helps,
> > > >> > >> >
> > > >> > >> >Ahmet
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi <
> > > >> > >> vivek@biginfolabs.com> wrote:
> > > >> > >> >Thanks, I will check with the jira.. but you dint answe my
> first
> > > >> > >> >question..? And there's no way to integrate solr with
> openNLP?or
> > > is
> > > >> > there
> > > >> > >> >any committed code, using which i can go head.
> > > >> > >> >
> > > >> > >> >Thanks,
> > > >> > >> >Vivek
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan <
> > iorixxx@yahoo.com>
> > > >> > wrote:
> > > >> > >> >
> > > >> > >> >> Hi,
> > > >> > >> >>
> > > >> > >> >> Here is the jira issue :
> > > >> > >> https://issues.apache.org/jira/browse/LUCENE-2899
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >> Anyone can create an account.
> > > >> > >> >>
> > > >> > >> >> I didn't use UIMA by myself and I have little knowledge
> about
> > > it.
> > > >> > But I
> > > >> > >> >> believe it is possible to use OpenNLP inside UIMA.
> > > >> > >> >> You need to dig into UIMA documentation.
> > > >> > >> >>
> > > >> > >> >> Solr UIMA integration already exists, thats why I questioned
> > > >> whether
> > > >> > >> your
> > > >> > >> >> requirement is possible with uima or not. I don't know the
> > > answer
> > > >> > >> myself.
> > > >> > >> >>
> > > >> > >> >> Ahmet
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >> On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi <
> > > >> > >> vivek@biginfolabs.com>
> > > >> > >> >> wrote:
> > > >> > >> >> Hi Arslan,
> > > >> > >> >>
> > > >> > >> >> If not uncommitted code, then which code to be used to
> > > integrate?
> > > >> > >> >>
> > > >> > >> >> If i have to comment my problems, which jira and how to put
> > it?
> > > >> > >> >>
> > > >> > >> >> And why you are suggesting UIMA integration. My requirements
> > is
> > > >> > >> integrating
> > > >> > >> >> with openNLP.? You mean we can do all the acitivties through
> > > UIMA
> > > >> as
> > > >> > >> we do
> > > >> > >> >> it using openNLP..?like name,location finder etc?
> > > >> > >> >>
> > > >> > >> >> Thanks,
> > > >> > >> >> Vivek
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >> On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan
> > > >> > <iorixxx@yahoo.com.invalid
> > > >> > >> >
> > > >> > >> >> wrote:
> > > >> > >> >>
> > > >> > >> >> > Hi,
> > > >> > >> >> >
> > > >> > >> >> > Uncommitted code could have these kind of problems. It is
> > not
> > > >> > >> guaranteed
> > > >> > >> >> > to work with latest trunk.
> > > >> > >> >> >
> > > >> > >> >> > You could commend the problem you face on the jira ticket.
> > > >> > >> >> >
> > > >> > >> >> > By the way, may be you are after something doable with
> > already
> > > >> > >> committed
> > > >> > >> >> > UIMA stuff?
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> > > >> > >> >> >
> > > >> > >> >> > Ahmet
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> > >> >> > On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi <
> > > >> > >> >> vivek@biginfolabs.com>
> > > >> > >> >> > wrote:
> > > >> > >> >> > I followed this link to integrate
> > > >> > >> https://wiki.apache.org/solr/OpenNLP
> > > >> > >> >> to
> > > >> > >> >> > integrate
> > > >> > >> >> >
> > > >> > >> >> > Installation
> > > >> > >> >> >
> > > >> > >> >> > For English language testing: Until LUCENE-2899 is
> > committed:
> > > >> > >> >> >
> > > >> > >> >> >     1.pull the latest trunk or 4.0 branch
> > > >> > >> >> >
> > > >> > >> >> >     2.apply the latest LUCENE-2899 patch
> > > >> > >> >> >     3.do 'ant compile'
> > > >> > >> >> >     cd solr/contrib/opennlp/src/test-files/training
> > > >> > >> >> >     .
> > > >> > >> >> >     .
> > > >> > >> >> >     .
> > > >> > >> >> > i followed first two steps but got the following error
> while
> > > >> > >> executing
> > > >> > >> >> 3rd
> > > >> > >> >> > point
> > > >> > >> >> >
> > > >> > >> >> > common.compile-core:
> > > >> > >> >> >     [javac] Compiling 10 source files to
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> > >> >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
> > > >> > >> >> >
> > > >> > >> >> >     [javac] warning: [path] bad path element
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> > >> >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> "/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar":
> > > >> > >> >> > no such file or directory
> > > >> > >> >> >
> > > >> > >> >> >     [javac]
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> > >> >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
> > > >> > >> >> > error: cannot find symbol
> > > >> > >> >> >
> > > >> > >> >> >     [javac]     super(Version.LUCENE_44, input);
> > > >> > >> >> >
> > > >> > >> >> >     [javac]                  ^
> > > >> > >> >> >     [javac]   symbol:   variable LUCENE_44
> > > >> > >> >> >     [javac]   location: class Version
> > > >> > >> >> >     [javac]
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> > >> >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
> > > >> > >> >> > error: no suitable constructor found for Tokenizer(Reader)
> > > >> > >> >> >     [javac]     super(input);
> > > >> > >> >> >     [javac]     ^
> > > >> > >> >> >     [javac]     constructor
> > > >> Tokenizer.Tokenizer(AttributeFactory)
> > > >> > is
> > > >> > >> not
> > > >> > >> >> > applicable
> > > >> > >> >> >     [javac]       (actual argument Reader cannot be
> > converted
> > > to
> > > >> > >> >> > AttributeFactory by method invocation conversion)
> > > >> > >> >> >     [javac]     constructor Tokenizer.Tokenizer() is not
> > > >> applicable
> > > >> > >> >> >     [javac]       (actual and formal argument lists differ
> > in
> > > >> > length)
> > > >> > >> >> >     [javac] 2 errors
> > > >> > >> >> >     [javac] 1 warning
> > > >> > >> >> >
> > > >> > >> >> > Im really stuck how to passthough this step. I wasted my
> > > entire
> > > >> to
> > > >> > >> fix
> > > >> > >> >> this
> > > >> > >> >> > but couldn't move a bit. Please someone help me..?
> > > >> > >> >> >
> > > >> > >> >> > Thanks,
> > > >> > >> >> > Vivek
> > > >> > >> >> >
> > > >> > >> >> >
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: Integrate solr with openNLP

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.
Actually we dropped integrating nlp with solr but we took two different
ideas:

* we're using nlp seperately not with solr
* we're taking help of UIMA for solr. Its more advanced.

If you've a specific question. you can ask me. I'll tell you if i know.

-Vivek

On Wed, Sep 10, 2014 at 3:46 PM, Aman Tandon <am...@gmail.com>
wrote:

> Hi,
>
> What is the progress of integration of nlp with solr. If you have achieved
> this integration techniques successfully then please share with us.
>
> With Regards
> Aman Tandon
>
> On Tue, Jun 10, 2014 at 11:04 AM, Vivekanand Ittigi <vivek@biginfolabs.com
> >
> wrote:
>
> > Hi Aman,
> >
> > Yeah, We are also thinking the same. Using UIMA is better. And thanks to
> > everyone. You guys really showed us the way(UIMA).
> >
> > We'll work on it.
> >
> > Thanks,
> > Vivek
> >
> >
> > On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon <am...@gmail.com>
> > wrote:
> >
> > > Hi Vikek,
> > >
> > > As everybody in the mail list mentioned to use UIMA you should go for
> it,
> > > as opennlp issues are not tracking properly, it can make stuck your
> > > development in near future if any issue comes, so its better to start
> > > investigate with uima.
> > >
> > >
> > > With Regards
> > > Aman Tandon
> > >
> > >
> > > On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi <
> > vivek@biginfolabs.com>
> > > wrote:
> > >
> > > > Can anyone pleas reply..?
> > > >
> > > > Thanks,
> > > > Vivek
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Vivekanand Ittigi <vi...@biginfolabs.com>
> > > > Date: Wed, Jun 4, 2014 at 4:38 PM
> > > > Subject: Re: Integrate solr with openNLP
> > > > To: Tommaso Teofili <to...@gmail.com>
> > > > Cc: "solr-user@lucene.apache.org" <so...@lucene.apache.org>,
> Ahmet
> > > > Arslan <io...@yahoo.com>
> > > >
> > > >
> > > > Hi Tommaso,
> > > >
> > > > Yes, you are right. 4.4 version will work.. I'm able to compile now.
> > I'm
> > > > trying to apply named recognition(person name) token but im not
> seeing
> > > any
> > > > change. my schema.xml looks like this:
> > > >
> > > > <field name="text" type="text_opennlp_pos_ner" indexed="true"
> > > stored="true"
> > > > multiValued="true"/>
> > > >
> > > > <fieldType name="text_opennlp_pos_ner" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >       <analyzer>
> > > >         <tokenizer class="solr.OpenNLPTokenizerFactory"
> > > >           tokenizerModel="opennlp/en-token.bin"
> > > >         />
> > > >         <filter class="solr.OpenNLPFilterFactory"
> > > >           nerTaggerModels="opennlp/en-ner-person.bin"
> > > >         />
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >       </analyzer>
> > > >
> > > >     </fieldType>
> > > >
> > > > Please guide..?
> > > >
> > > > Thanks,
> > > > Vivek
> > > >
> > > >
> > > > On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili <
> > > tommaso.teofili@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Ahment was suggesting to eventually use UIMA integration because
> > > OpenNLP
> > > > > has already an integration with Apache UIMA and so you would just
> > have
> > > to
> > > > > use that [1].
> > > > > And that's one of the main reason UIMA integration was done: it's a
> > > > > framework that you can easily hook into in order to plug your NLP
> > > > algorithm.
> > > > >
> > > > > If you want to just use OpenNLP then it's up to you if either write
> > > your
> > > > > own UpdateRequestProcessor plugin [2] to add metadata extracted by
> > > > OpenNLP
> > > > > to your documents or either you can write a dedicated analyzer /
> > > > tokenizer
> > > > > / token filter.
> > > > >
> > > > > For the OpenNLP integration (LUCENE-2899), the patch is not up to
> > date
> > > > > with the latest APIs in trunk, however you should be able to apply
> it
> > > to
> > > > > (if I recall correctly) to 4.4 version or so, and also adapting it
> to
> > > the
> > > > > latest API shouldn't be too hard.
> > > > >
> > > > > Regards,
> > > > > Tommaso
> > > > >
> > > > > [1] :
> > > > >
> > > >
> > >
> >
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
> > > > > [2] : http://wiki.apache.org/solr/UpdateRequestProcessor
> > > > >
> > > > >
> > > > >
> > > > > 2014-06-03 15:34 GMT+02:00 Ahmet Arslan <iorixxx@yahoo.com.invalid
> >:
> > > > >
> > > > > Can you extract names, locations etc using OpenNLP in
> plain/straight
> > > java
> > > > >> program?
> > > > >>
> > > > >> If yes, here are two seperate options :
> > > > >>
> > > > >> 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
> > > > >> example to integrate your NER code into it and write your own
> > indexing
> > > > >> code. You have the full power here. No solr-plugins are involved.
> > > > >>
> > > > >> 2) Use 'Implementing a conditional copyField' given here :
> > > > >> http://wiki.apache.org/solr/UpdateRequestProcessor
> > > > >> as an example and integrate your NER code into it.
> > > > >>
> > > > >>
> > > > >> Please note that these are separate ways to enrich your incoming
> > > > >> documents, choose either (1) or (2).
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi <
> > > > >> vivek@biginfolabs.com> wrote:
> > > > >> Okay, but i dint understand what you said. Can you please
> elaborate.
> > > > >>
> > > > >> Thanks,
> > > > >> Vivek
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan <io...@yahoo.com>
> > > wrote:
> > > > >>
> > > > >> > Hi Vivekanand,
> > > > >> >
> > > > >> > I have never use UIMA+Solr before.
> > > > >> >
> > > > >> > Personally I think it takes more time to learn how to
> > configure/use
> > > > >> these
> > > > >> > uima stuff.
> > > > >> >
> > > > >> >
> > > > >> > If you are familiar with java, write a class that extends
> > > > >> > UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these
> > new
> > > > >> fields
> > > > >> > (organisation, city, person name, etc, to your document. This
> > phase
> > > is
> > > > >> > usually called 'enrichment'.
> > > > >> >
> > > > >> > Does that makes sense?
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi <
> > > > >> vivek@biginfolabs.com>
> > > > >> > wrote:
> > > > >> > Hi Ahmet,
> > > > >> >
> > > > >> > I followed what you said
> > > > >> >
> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> > .
> > > > But
> > > > >> how
> > > > >> > can i achieve my goal? i mean extracting only name of the
> > > organization
> > > > >> or
> > > > >> > person from the content field.
> > > > >> >
> > > > >> > I guess i'm almost there but something is missing? please guide
> me
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Vivek
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi <
> > > > >> vivek@biginfolabs.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Entire goal cant be said but one of those tasks can be like
> > this..
> > > > we
> > > > >> > have
> > > > >> > > big document(can be website or pdf etc) indexed to the solr.
> > > > >> > > Lets say <field name=content> will sore store the contents of
> > > > >> document.
> > > > >> > > All i want to do is pick name of persons,places from it using
> > > > openNLP
> > > > >> or
> > > > >> > > some other means.
> > > > >> > >
> > > > >> > > Those names should be reflected in solr itself.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Vivek
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan <
> iorixxx@yahoo.com
> > >
> > > > >> wrote:
> > > > >> > >
> > > > >> > >> Hi,
> > > > >> > >>
> > > > >> > >> Please tell us what you are trying to in a new treat. Your
> high
> > > > level
> > > > >> > >> goal. There may be some other ways/tools such as (
> > > > >> > >> https://stanbol.apache.org ) other than OpenNLP.
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi <
> > > > >> > >> vivek@biginfolabs.com> wrote:
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> We'll surely look into UIMA integration.
> > > > >> > >>
> > > > >> > >> But before moving, is this(
> > https://wiki.apache.org/solr/OpenNLP
> > > )
> > > > >> the
> > > > >> > >> only link we've got to integrate?isn't there any other
> article
> > or
> > > > >> link
> > > > >> > >> which may help us to do fix this problem.
> > > > >> > >>
> > > > >> > >> Thanks,
> > > > >> > >> Vivek
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan <
> > iorixxx@yahoo.com>
> > > > >> wrote:
> > > > >> > >>
> > > > >> > >> Hi,
> > > > >> > >> >
> > > > >> > >> >I believe I answered it. Let me re-try,
> > > > >> > >> >
> > > > >> > >> >There is no committed code for OpenNLP. There is an open
> > ticket
> > > > with
> > > > >> > >> patches. They may not work with current trunk.
> > > > >> > >> >
> > > > >> > >> >Confluence is the official documentation. Wiki is maintained
> > by
> > > > >> > >> community. Meaning wiki can talk about some uncommitted
> > > > >> features/stuff.
> > > > >> > >> Like this one : https://wiki.apache.org/solr/OpenNLP
> > > > >> > >> >
> > > > >> > >> >What I am suggesting is, have a look at
> > > > >> > >>
> > > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >And search how to use OpenNLP inside UIMA. May be
> LUCENE-2899
> > is
> > > > >> > already
> > > > >> > >> doable with solr-uima. I am adding Tommaso (sorry for this
> but
> > we
> > > > >> need
> > > > >> > an
> > > > >> > >> authoritative answer here) to clarify this.
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >Also consider indexing with SolrJ and use OpenNLP enrichment
> > > > outside
> > > > >> > the
> > > > >> > >> solr. Use openNLP with plain java, enrich your documents and
> > > index
> > > > >> them
> > > > >> > >> with SolJ. You don't have to too everything inside solr as
> > > > >> solr-plugins.
> > > > >> > >> >
> > > > >> > >> >Hope this helps,
> > > > >> > >> >
> > > > >> > >> >Ahmet
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi <
> > > > >> > >> vivek@biginfolabs.com> wrote:
> > > > >> > >> >Thanks, I will check with the jira.. but you dint answe my
> > first
> > > > >> > >> >question..? And there's no way to integrate solr with
> > openNLP?or
> > > > is
> > > > >> > there
> > > > >> > >> >any committed code, using which i can go head.
> > > > >> > >> >
> > > > >> > >> >Thanks,
> > > > >> > >> >Vivek
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan <
> > > iorixxx@yahoo.com>
> > > > >> > wrote:
> > > > >> > >> >
> > > > >> > >> >> Hi,
> > > > >> > >> >>
> > > > >> > >> >> Here is the jira issue :
> > > > >> > >> https://issues.apache.org/jira/browse/LUCENE-2899
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >> Anyone can create an account.
> > > > >> > >> >>
> > > > >> > >> >> I didn't use UIMA by myself and I have little knowledge
> > about
> > > > it.
> > > > >> > But I
> > > > >> > >> >> believe it is possible to use OpenNLP inside UIMA.
> > > > >> > >> >> You need to dig into UIMA documentation.
> > > > >> > >> >>
> > > > >> > >> >> Solr UIMA integration already exists, thats why I
> questioned
> > > > >> whether
> > > > >> > >> your
> > > > >> > >> >> requirement is possible with uima or not. I don't know the
> > > > answer
> > > > >> > >> myself.
> > > > >> > >> >>
> > > > >> > >> >> Ahmet
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >> On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi <
> > > > >> > >> vivek@biginfolabs.com>
> > > > >> > >> >> wrote:
> > > > >> > >> >> Hi Arslan,
> > > > >> > >> >>
> > > > >> > >> >> If not uncommitted code, then which code to be used to
> > > > integrate?
> > > > >> > >> >>
> > > > >> > >> >> If i have to comment my problems, which jira and how to
> put
> > > it?
> > > > >> > >> >>
> > > > >> > >> >> And why you are suggesting UIMA integration. My
> requirements
> > > is
> > > > >> > >> integrating
> > > > >> > >> >> with openNLP.? You mean we can do all the acitivties
> through
> > > > UIMA
> > > > >> as
> > > > >> > >> we do
> > > > >> > >> >> it using openNLP..?like name,location finder etc?
> > > > >> > >> >>
> > > > >> > >> >> Thanks,
> > > > >> > >> >> Vivek
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >> On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan
> > > > >> > <iorixxx@yahoo.com.invalid
> > > > >> > >> >
> > > > >> > >> >> wrote:
> > > > >> > >> >>
> > > > >> > >> >> > Hi,
> > > > >> > >> >> >
> > > > >> > >> >> > Uncommitted code could have these kind of problems. It
> is
> > > not
> > > > >> > >> guaranteed
> > > > >> > >> >> > to work with latest trunk.
> > > > >> > >> >> >
> > > > >> > >> >> > You could commend the problem you face on the jira
> ticket.
> > > > >> > >> >> >
> > > > >> > >> >> > By the way, may be you are after something doable with
> > > already
> > > > >> > >> committed
> > > > >> > >> >> > UIMA stuff?
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> > > > >> > >> >> >
> > > > >> > >> >> > Ahmet
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> > >> >> > On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi <
> > > > >> > >> >> vivek@biginfolabs.com>
> > > > >> > >> >> > wrote:
> > > > >> > >> >> > I followed this link to integrate
> > > > >> > >> https://wiki.apache.org/solr/OpenNLP
> > > > >> > >> >> to
> > > > >> > >> >> > integrate
> > > > >> > >> >> >
> > > > >> > >> >> > Installation
> > > > >> > >> >> >
> > > > >> > >> >> > For English language testing: Until LUCENE-2899 is
> > > committed:
> > > > >> > >> >> >
> > > > >> > >> >> >     1.pull the latest trunk or 4.0 branch
> > > > >> > >> >> >
> > > > >> > >> >> >     2.apply the latest LUCENE-2899 patch
> > > > >> > >> >> >     3.do 'ant compile'
> > > > >> > >> >> >     cd solr/contrib/opennlp/src/test-files/training
> > > > >> > >> >> >     .
> > > > >> > >> >> >     .
> > > > >> > >> >> >     .
> > > > >> > >> >> > i followed first two steps but got the following error
> > while
> > > > >> > >> executing
> > > > >> > >> >> 3rd
> > > > >> > >> >> > point
> > > > >> > >> >> >
> > > > >> > >> >> > common.compile-core:
> > > > >> > >> >> >     [javac] Compiling 10 source files to
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> > >> >>
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
> > > > >> > >> >> >
> > > > >> > >> >> >     [javac] warning: [path] bad path element
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> > >> >>
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> "/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar":
> > > > >> > >> >> > no such file or directory
> > > > >> > >> >> >
> > > > >> > >> >> >     [javac]
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> > >> >>
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
> > > > >> > >> >> > error: cannot find symbol
> > > > >> > >> >> >
> > > > >> > >> >> >     [javac]     super(Version.LUCENE_44, input);
> > > > >> > >> >> >
> > > > >> > >> >> >     [javac]                  ^
> > > > >> > >> >> >     [javac]   symbol:   variable LUCENE_44
> > > > >> > >> >> >     [javac]   location: class Version
> > > > >> > >> >> >     [javac]
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> > >> >>
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
> > > > >> > >> >> > error: no suitable constructor found for
> Tokenizer(Reader)
> > > > >> > >> >> >     [javac]     super(input);
> > > > >> > >> >> >     [javac]     ^
> > > > >> > >> >> >     [javac]     constructor
> > > > >> Tokenizer.Tokenizer(AttributeFactory)
> > > > >> > is
> > > > >> > >> not
> > > > >> > >> >> > applicable
> > > > >> > >> >> >     [javac]       (actual argument Reader cannot be
> > > converted
> > > > to
> > > > >> > >> >> > AttributeFactory by method invocation conversion)
> > > > >> > >> >> >     [javac]     constructor Tokenizer.Tokenizer() is not
> > > > >> applicable
> > > > >> > >> >> >     [javac]       (actual and formal argument lists
> differ
> > > in
> > > > >> > length)
> > > > >> > >> >> >     [javac] 2 errors
> > > > >> > >> >> >     [javac] 1 warning
> > > > >> > >> >> >
> > > > >> > >> >> > Im really stuck how to passthough this step. I wasted my
> > > > entire
> > > > >> to
> > > > >> > >> fix
> > > > >> > >> >> this
> > > > >> > >> >> > but couldn't move a bit. Please someone help me..?
> > > > >> > >> >> >
> > > > >> > >> >> > Thanks,
> > > > >> > >> >> > Vivek
> > > > >> > >> >> >
> > > > >> > >> >> >
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>