You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Kulkarni, Ajit Kamalakar" <aj...@ptc.com> on 2009/03/11 14:07:41 UTC

RE: Combination of EmbeddedSolrServer and CommonHttpSolrServer

Ryan,

If we index the documents using CommonsHttpSolrServer and search using
the same, we get the updated results

That means we can search the latest added document as well even if it is
not committed to the file system 

 

So it looks like there is some kind of cache that is used by both index
and search logic inside solr for a given SolrServer components (e. g.
CommonsHttpSolrServer, EmbeddedSolrServer)

 

Is there any way to configure that same cache  will be used by the
component that respond to HTTP request through CommonsHttpSolrServer and
the component used by EmbeddedSolrServer?

 

I don't see any reason why searcher and/or indexer for a given
SolrServer need to maintain exclusive cache

 

Calling commit on the SolrServer to synch with the index data may not be
good option as I suppose it to be expensive operation.

 

The cache and hard disk data synchronization should be independent of
the SolrServer instances managed by Solr Web Application inside tomcat.

 

The issue still will be that EmbeddedSolrServer may directly access hard
index data as it may bypass the Solr web app totally

 

I am embedding tomcat in my RMI server. 

The RMI Server is going to use EmbeddedSolrServer and it also hosts the
Solr WebApp inside its tomcat instance

 

So I guess I should be able to manage a singleton cache  that is given
to both, CommonsHttpSolrServer related components managed inside Solr
WebApp and EmbeddedSolrServer components

 

Please comment.

 

Thanks,

Ajit

 

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Monday, February 09, 2009 9:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

 

> 

 

Keep in mind that the way lucene/solr work is that the results are  

constant from when you open the searcher.  If new documents are added  

(without re-opening the searcher) they will not be seen.

 

<commit/>  tells solr to re-open the index and see the changes.

 

 

> 1. Does this mean that committing on the indexing (Embedded) server  

> does

> not reflect the document changes when we fire a search through another

> (HTTP) server?

 

correct.  The HTTP server would still be open from before the indexing  

happened.

 

> 

> 2. What happens to the commit fired on the indexing server? Can I  

> remove

> that and just commit on the "read only" server?

 

Call commit on the indexing server, then the read only server then you  

can delete the Embedded server

 

 

> 

> 3. Do we have to fire a Commit (on the HTTP server) before we try to

> search for a document?

 

Yes -- calling commit will re-open the index and reflect any changes  

to it

 

 

> 

> 4. Can we make any setting (perhaps using auto-commit) on the HTTP

> server to avoid this scenario?

> 

 

Not really -- the HTTP core has no idea what is happening on the other  

core.

 

 

ryan

RE: Combination of EmbeddedSolrServer and CommonHttpSolrServer

Posted by "Kulkarni, Ajit Kamalakar" <aj...@ptc.com>.

Hi Shalin Shekhar Mangar,

Thanks for your inputs.

Please see my comments below.

I wish to know if there is any user who used EmbeddedSolrServer for
indexing and CommonsHttpSolrServer for search.

I have found that this combination offers better performance for
indexing. Searching becomes flexible as you can search from more number
of http clients simultaneously.

Does anyone have any related performance data? 

Thanks,

Ajit

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: Wednesday, March 11, 2009 7:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

On Wed, Mar 11, 2009 at 6:37 PM, Kulkarni, Ajit Kamalakar <

ajkulkarni@ptc.com> wrote:

> 

> If we index the documents using CommonsHttpSolrServer and search using

> the same, we get the updated results

> 

> That means we can search the latest added document as well even if it
is

> not committed to the file system

> 

That is not possible. Without calling commit, new documents will not be

visible to a searcher.

Ajit: When I tested using CommonsHttpSolrServer for indexing as well as
searching, I could search the latest added document through solr admin
page.

I could also search the document through CommonsHttpSolrServer without
explicitly calling commit.

I am even more surprised to see the same result by using
EmbeddedSolrServer for indexing and for searching CommonsHttpSolrServer.

I used embeddedSolrServer = new
EmbeddedSolrServer(SolrCore.getSolrCore()); which is deprecated API.

For this I did not need to call commit on CommonsHttpSolrServer to get
latest document searched on either solr admin page or even
programmatically through CommonsHttpSolrServer

However if I use 

      CoreContainer multicore = new CoreContainer(); 

      File home = new File( getSolrHome() );

      File f = new File( home, "solr.xml" );

      multicore.load( getSolrHome(), f );

      embeddedSolrServer = new EmbeddedSolrServer( multicore,
SolrIndexConstants.DEFAULT_CORE );

I had to use commit on CommonsHttpSolrServer to search the latest added
documents and the document was available through solr admin page only
when I programatcaaly searched after calling commit on
CommonsHttpSolrServer

This is consistent with what you mentioned above.

> So it looks like there is some kind of cache that is used by both
index

> and search logic inside solr for a given SolrServer components (e. g.

> CommonsHttpSolrServer, EmbeddedSolrServer)

> 

Indexing does not create any cache. The caching is done only by the

searcher. The old searcher/cache is discarded and a new searcher/cache
is

created when you call commit. Setting autoWarmCount on the caches in

solrconfig.xml makes the new searcher run some of the most recently used

queries on the old searcher to warm up the new cache.

Calling commit on the SolrServer to synch with the index data may not be

> good option as I suppose it to be expensive operation.

> 

It is the only option. But you may be able to make the operation cheaper
by

tweaking the autowarmCount on the caches (this is specified in

solrconfig.xml). However, caches are important for good search
performance.

Depending on your search traffic, you'll need to find a sweet spot.

> The cache and hard disk data synchronization should be independent of

> the SolrServer instances managed by Solr Web Application inside
tomcat.

> 

SolrServer is not really a server in itself. It is (a pointer to?) a
server

being used by a solrj client. The CommonsHttpSolrServer refers to a
remote

server url and makes calls through HTTP. SolrCore is the internal class

which manages the state of the server.

A SolrCore is created by the solr webapp. When you create another
SolrCore

for use by EmbeddedSolrServer, they do not know about each other.
Therefore

you need to notify it if you change the index through another core.

Ajit: If the same JVM is managing responding searchers for
EmbeddedSolrServer as well as CommonsHttpSolrServer, then why can't
responding searcher be same? I understand that EmbeddedSolrServer and
CommonsHttpSolrServer clients are separate but if searchers are managed
in same JVM, theoretically we should be able to make singleton searcher
attached to every kind of SolrServer. This searcher should be listener
for indexer.

Since searching is read operation, there won't be any threading or
scalability issue but indexer should be one

Since I don't have enough knowledge about solr and lucene so I may be
totally wrong!

> The issue still will be that EmbeddedSolrServer may directly access
hard

> index data as it may bypass the Solr web app totally

> 

> I am embedding tomcat in my RMI server.

> 

> The RMI Server is going to use EmbeddedSolrServer and it also hosts
the

> Solr WebApp inside its tomcat instance

> 

> So I guess I should be able to manage a singleton cache  that is given

> to both, CommonsHttpSolrServer related components managed inside Solr

> WebApp and EmbeddedSolrServer components

> 

> 

Why have two of them at all? Is the solr deployed inside tomcat serves
HTTP

requests from external clients without going through your RMI server?
You

can simplify things by keeping it either in tomcat or in embedded mode.

Ajit: The outside http search requests are served by solr web app
running under tomcat embedded in RMI server. 

RMI server is just a host.

I have multiple remote java clients that can simultaneously search. http
seems better approach for searching. 

Can you support this kind of searching through embedded mode? I guess
embedded mode is for local client.

Hope that helps.

-- 

Regards,

Shalin Shekhar Mangar.

Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Wed, Mar 11, 2009 at 6:37 PM, Kulkarni, Ajit Kamalakar <
ajkulkarni@ptc.com> wrote:

>
> If we index the documents using CommonsHttpSolrServer and search using
> the same, we get the updated results
>
> That means we can search the latest added document as well even if it is
> not committed to the file system
>

That is not possible. Without calling commit, new documents will not be
visible to a searcher.

> So it looks like there is some kind of cache that is used by both index
> and search logic inside solr for a given SolrServer components (e. g.
> CommonsHttpSolrServer, EmbeddedSolrServer)
>

Indexing does not create any cache. The caching is done only by the
searcher. The old searcher/cache is discarded and a new searcher/cache is
created when you call commit. Setting autoWarmCount on the caches in
solrconfig.xml makes the new searcher run some of the most recently used
queries on the old searcher to warm up the new cache.

Calling commit on the SolrServer to synch with the index data may not be
> good option as I suppose it to be expensive operation.
>

It is the only option. But you may be able to make the operation cheaper by
tweaking the autowarmCount on the caches (this is specified in
solrconfig.xml). However, caches are important for good search performance.
Depending on your search traffic, you'll need to find a sweet spot.

> The cache and hard disk data synchronization should be independent of
> the SolrServer instances managed by Solr Web Application inside tomcat.
>

SolrServer is not really a server in itself. It is (a pointer to?) a server
being used by a solrj client. The CommonsHttpSolrServer refers to a remote
server url and makes calls through HTTP. SolrCore is the internal class
which manages the state of the server.

A SolrCore is created by the solr webapp. When you create another SolrCore
for use by EmbeddedSolrServer, they do not know about each other. Therefore
you need to notify it if you change the index through another core.

> The issue still will be that EmbeddedSolrServer may directly access hard
> index data as it may bypass the Solr web app totally
>
> I am embedding tomcat in my RMI server.
>
> The RMI Server is going to use EmbeddedSolrServer and it also hosts the
> Solr WebApp inside its tomcat instance
>
> So I guess I should be able to manage a singleton cache  that is given
> to both, CommonsHttpSolrServer related components managed inside Solr
> WebApp and EmbeddedSolrServer components
>
>
Why have two of them at all? Is the solr deployed inside tomcat serves HTTP
requests from external clients without going through your RMI server? You
can simplify things by keeping it either in tomcat or in embedded mode.

Hope that helps.

-- 
Regards,
Shalin Shekhar Mangar.