You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by Todd Hunt <To...@nisc.coop> on 2013/05/20 17:56:45 UTC

Existing Project using Hibernate, Spring and Lucene and Looking to Add Solr

Hi,

We have an existing Java based enterprise application that is bundled as a WAR file and runs on Tomcat and uses Spring 3.0.5, Hibernate 3.6.2, and Lucene 3.0.3. We are using annotations in Hibernate that nicely couple it Lucene to index objects (documents, images, PDFs, etc.) based on key value pairs. We use Hibernate Search to retrieve the results were are looking for.

We want to extend our indexing capability to use Tika to extract text and metadata out of documents that are uploaded to the server and index that content.

When I initially read about Solr I saw that it would provide extra functionality on top of Lucene. I was eager to get it integrated with our application. But now that I have fully read "Apache Solr 3 Enterprise Search Server" I feel that my initial impressions of Solr were wrong.

I saw where Solr talked about using web services to upload files for indexing and also to perform searching and download content. I thought that was just a nice feature that was available. But I was not interested in that due to the fact that our application already has a web service interface that is used by our own home grown client application that communicates with the enterprise application above.

I've read about SolrJ / Solr Cell, EmebbedSolrServer, BackendQueueProcessor, and DIH and researched them on the web. But none of them have provided me with the information to take a Hibernate managed object, inside of a transaction, persist the binary data in the database (which we are already doing), extra the text / contents from the binary file via Tika (which is a separate issue for a separate thread), and index that text with either Java API code or Java Annotations.

It seems like Solr forces one to expose access to its "Cores" (indexes) via its own WAR file. I don't want that. I just want to be able to utilize the Solr Java API to integrate with our current web services and Hibernate framework to index text based documents. Then allow our users to perform open text searching and utilize Solr's advance features like highlighting, MLT, spell checking, suggester and faceting. But I just don't see how to integrate what Solr has to offer with our existing web application. I get the feeling that I have to create a new Solr based web application and then have the current application delegate indexing and searching to the Solr application, which is not what I really want to do, if possible.

I've looked through the Solr Java Docs and I haven't found anything substantial that would allow for me to just use Java code instead of creating HTTP connections to index and search for data. Will someone let me know if what I am looking for is out of the scope of Solr's functionality or if there is a way, please provide an example of how I can accomplish this?

Thank you,

Todd

Re: Existing Project using Hibernate, Spring and Lucene and Looking to Add Solr

Posted by Sanne Grinovero <sa...@gmail.com>.

Since you're using Hibernate Search and now want to use Tika as well,
you should just upgrade to a more recent version of Hibernate Search
as it provides the Apache Tika integration as well:

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e4244

https://community.jboss.org/wiki/HibernateSearchMigrationGuide

Some other features from Solr are supported as well but as you say
it's all embedded without the need to "start cores" or deploy separate
web applications.

The downside is that you would need to upgrade also Hibernate ORM,
Spring, and Apache Lucene, as the versions you mentioned are too old
for Hibernate Search 4.3 or 4.2 (which is the first one sporting the
Tika integration).

Sanne

On 20 May 2013 17:45, Upayavira <uv...@odoko.co.uk> wrote:
> Try using an embedded SolrServer:
>
> http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
>
> That gets rid of the http part of the picture. I've not used it, so
> can't say how well it works.
>
> Note, you might meet more people with experience of it if you asked your
> question on solr-user@lucene.apache.org
>
> Upayavira
>
> On Mon, May 20, 2013, at 04:56 PM, Todd Hunt wrote:
>> Hi,
>>
>> We have an existing Java based enterprise application that is bundled as
>> a WAR file and runs on Tomcat and uses Spring 3.0.5, Hibernate 3.6.2, and
>> Lucene 3.0.3.  We are using annotations in Hibernate that nicely couple
>> it Lucene to index objects (documents, images, PDFs, etc.) based on key
>> value pairs.  We use Hibernate Search to retrieve the results were are
>> looking for.
>>
>> We want to extend our indexing capability to use Tika to extract text and
>> metadata out of documents that are uploaded to the server and index that
>> content.
>>
>> When I initially read about Solr I saw that it would provide extra
>> functionality on top of Lucene.  I was eager to get it integrated with
>> our application.  But now that I have fully read "Apache Solr 3
>> Enterprise Search Server" I feel that my initial impressions of Solr were
>> wrong.
>>
>> I saw where Solr talked about using web services to upload files for
>> indexing and also to perform searching and download content.  I thought
>> that was just a nice feature that was available.  But I was not
>> interested in that due to the fact that our application already has a web
>> service interface that is used by our own home grown client application
>> that communicates with the enterprise application above.
>>
>> I've read about SolrJ / Solr Cell, EmebbedSolrServer,
>> BackendQueueProcessor, and DIH and researched them on the web.  But none
>> of them have provided me with the information to take a Hibernate managed
>> object, inside of a transaction, persist the binary data in the database
>> (which we are already doing), extra the text / contents from the binary
>> file via Tika (which is a separate issue for a separate thread), and
>> index that text with either Java API code or Java Annotations.
>>
>> It seems like Solr forces one to expose access to its "Cores" (indexes)
>> via its own WAR file.  I don't want that.  I just want to be able to
>> utilize the Solr Java API to integrate with our current web services and
>> Hibernate framework to index text based documents.  Then allow our users
>> to perform open text searching and utilize Solr's advance features like
>> highlighting, MLT, spell checking, suggester and faceting.  But I just
>> don't see how to integrate what Solr has to offer with our existing web
>> application.  I get the feeling that I have to create a new Solr based
>> web application and then have the current application delegate indexing
>> and searching to the Solr application, which is not what I really want to
>> do, if possible.
>>
>> I've looked through the Solr Java Docs and I haven't found anything
>> substantial that would allow for me to just use Java code instead of
>> creating HTTP connections to index and search for data.  Will someone let
>> me know if what I am looking for is out of the scope of Solr's
>> functionality or if there is a way, please provide an example of how I
>> can accomplish this?
>>
>> Thank you,
>>
>> Todd

Re: Existing Project using Hibernate, Spring and Lucene and Looking to Add Solr

Posted by Upayavira <uv...@odoko.co.uk>.

Try using an embedded SolrServer:

http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

That gets rid of the http part of the picture. I've not used it, so
can't say how well it works.

Note, you might meet more people with experience of it if you asked your
question on solr-user@lucene.apache.org

Upayavira

On Mon, May 20, 2013, at 04:56 PM, Todd Hunt wrote:
> Hi,
> 
> We have an existing Java based enterprise application that is bundled as
> a WAR file and runs on Tomcat and uses Spring 3.0.5, Hibernate 3.6.2, and
> Lucene 3.0.3.  We are using annotations in Hibernate that nicely couple
> it Lucene to index objects (documents, images, PDFs, etc.) based on key
> value pairs.  We use Hibernate Search to retrieve the results were are
> looking for.
> 
> We want to extend our indexing capability to use Tika to extract text and
> metadata out of documents that are uploaded to the server and index that
> content.
> 
> When I initially read about Solr I saw that it would provide extra
> functionality on top of Lucene.  I was eager to get it integrated with
> our application.  But now that I have fully read "Apache Solr 3
> Enterprise Search Server" I feel that my initial impressions of Solr were
> wrong.
> 
> I saw where Solr talked about using web services to upload files for
> indexing and also to perform searching and download content.  I thought
> that was just a nice feature that was available.  But I was not
> interested in that due to the fact that our application already has a web
> service interface that is used by our own home grown client application
> that communicates with the enterprise application above.
> 
> I've read about SolrJ / Solr Cell, EmebbedSolrServer,
> BackendQueueProcessor, and DIH and researched them on the web.  But none
> of them have provided me with the information to take a Hibernate managed
> object, inside of a transaction, persist the binary data in the database
> (which we are already doing), extra the text / contents from the binary
> file via Tika (which is a separate issue for a separate thread), and
> index that text with either Java API code or Java Annotations.
> 
> It seems like Solr forces one to expose access to its "Cores" (indexes)
> via its own WAR file.  I don't want that.  I just want to be able to
> utilize the Solr Java API to integrate with our current web services and
> Hibernate framework to index text based documents.  Then allow our users
> to perform open text searching and utilize Solr's advance features like
> highlighting, MLT, spell checking, suggester and faceting.  But I just
> don't see how to integrate what Solr has to offer with our existing web
> application.  I get the feeling that I have to create a new Solr based
> web application and then have the current application delegate indexing
> and searching to the Solr application, which is not what I really want to
> do, if possible.
> 
> I've looked through the Solr Java Docs and I haven't found anything
> substantial that would allow for me to just use Java code instead of
> creating HTTP connections to index and search for data.  Will someone let
> me know if what I am looking for is out of the scope of Solr's
> functionality or if there is a way, please provide an example of how I
> can accomplish this?
> 
> Thank you,
> 
> Todd