You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2011/08/12 21:49:37 UTC

Some questions about SolrJ

I currently have a build system for my Solr index written in Perl.  I am 
in the process of rewriting it in Java.  I've reached the part of the 
project where I'm using SolrJ, and I have a bunch of questions.  All of 
the SolrJ examples I can find are too simple to answer them.

A note before I launch into the questions.  The wiki page for SolrJ says 
that a static instance of CommonsHttpSolrServer is recommended, but NONE 
of the examples I have been able to find actually use it that way.  I've 
since learned that our webapp is creating a new object for every query.  
I've brought it to the attention of our development team, they'll be 
fixing it.

1) I can't find any examples of using CoreAdmin with SolrJ.  There seems 
to be a general lack of examples of doing anything complicated at all.  
Can anyone point me at comprehensive and detailed examples of using 
SolrJ that do everything in accordance with SolrJ recommendations?

2) When constructing and using HTTP requests that you make yourself, you 
can use a POST request to issue a query.  I use this method in my Perl 
build system to check for the existence of a large quantity of 
documents, and if any of them do exist, I use the same query to delete 
those documents with another POST request.  Can I do the same thing with 
SolrJ, or is it limited to queries using GET requests only?

3) I'll need to access CoreAdmin as well as individual cores for 
updates, queries, etc.  The former uses a /solr/ URL, the latter 
/solr/corename/.  Will I need two CommonsHttpSolrServer instances to do 
this, or is there a way to specify a core through a parameter?

I am sure that I have more questions, but I may be able to answer a lot 
of them myself if I can see better examples.

Thanks,
Shawn

Re: Some questions about SolrJ

Posted by Erick Erickson <er...@gmail.com>.

About updating the Wiki, just create your login and have at it. Anything
people think is wrong, they can edit <G>....

Best
Erick

On Sun, Aug 14, 2011 at 3:39 PM, Shawn Heisey <so...@elyograg.org> wrote:
> On 8/13/2011 9:59 AM, Michael Sokolov wrote:
>>
>>> Shawn, my experience with SolrJ in that configuration (no autoCommit) is
>>> that you have control over commits: if you don't issue an explicit commit,
>>> it won't happen.  Re lifecycle: we don't use a static instance; rather our
>>> app maintains a small pool of CommonsHttpSolrServer instances that we re-use
>>> across requests.  I think that will be preferable since I don't think the
>>> underlying HttpClient is thread safe?
>>
>> Hmm, I just checked and actually CommonsHttpSolrServer uses
>> MultiThreadedHttpConnectionManager so it should be thread-safe, and OK to
>> use a static instance as per documentation.  Sorry for the misinformation.
>
> Thanks for the help!
>
> I've been able to muddle my way through part of my implementation on my own.
>  There doesn't seem to be any way to point to the base /solr/ url and then
> ask SolrJ to add a core when creating requests.  I did see that you can set
> the URL for the server object after it's created, but if I ever make this
> thing multithreaded, I fear doing so will cause problems.  I'm going with
> one server object (solrServer) for CoreAdmin and another object (solrCore)
> for requests against the core.
>
> This new build system has an object representing one complete index, which
> uses a container of seven objects representing each of the shards.  Each of
> the shard objects has two objects representing a build core and a live core.
>  Each of the core objects contains the solrServer and solrCore already
> mentioned.  Since I have two complete indexes, this means that the final
> product will initialize 56 server objects.
>
> I couldn't use static server objects as recommended by the docs, because I
> have so many instances that all need different URLs.  They are private class
> members that get created only once, so I think it will be OK.  A static
> object would be a good idea for a search application, because it likely only
> needs to deal with one URL.  Our webapp developers told me that they will be
> putting the server object into a bean in the application context.
>
> When I've got everything done and debugged, I will use what I've learned to
> augment the SolrJ wiki page.  Who is the best community person to coordinate
> with on that to make sure I put up good information?
>
> Thanks,
> Shawn
>
>

Re: Some questions about SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.

On 8/13/2011 9:59 AM, Michael Sokolov wrote:
>
>> Shawn, my experience with SolrJ in that configuration (no autoCommit) 
>> is that you have control over commits: if you don't issue an explicit 
>> commit, it won't happen.  Re lifecycle: we don't use a static 
>> instance; rather our app maintains a small pool of 
>> CommonsHttpSolrServer instances that we re-use across requests.  I 
>> think that will be preferable since I don't think the underlying 
>> HttpClient is thread safe?
> Hmm, I just checked and actually CommonsHttpSolrServer uses 
> MultiThreadedHttpConnectionManager so it should be thread-safe, and OK 
> to use a static instance as per documentation.  Sorry for the 
> misinformation.

Thanks for the help!

I've been able to muddle my way through part of my implementation on my 
own.  There doesn't seem to be any way to point to the base /solr/ url 
and then ask SolrJ to add a core when creating requests.  I did see that 
you can set the URL for the server object after it's created, but if I 
ever make this thing multithreaded, I fear doing so will cause 
problems.  I'm going with one server object (solrServer) for CoreAdmin 
and another object (solrCore) for requests against the core.

This new build system has an object representing one complete index, 
which uses a container of seven objects representing each of the 
shards.  Each of the shard objects has two objects representing a build 
core and a live core.  Each of the core objects contains the solrServer 
and solrCore already mentioned.  Since I have two complete indexes, this 
means that the final product will initialize 56 server objects.

I couldn't use static server objects as recommended by the docs, because 
I have so many instances that all need different URLs.  They are private 
class members that get created only once, so I think it will be OK.  A 
static object would be a good idea for a search application, because it 
likely only needs to deal with one URL.  Our webapp developers told me 
that they will be putting the server object into a bean in the 
application context.

When I've got everything done and debugged, I will use what I've learned 
to augment the SolrJ wiki page.  Who is the best community person to 
coordinate with on that to make sure I put up good information?

Thanks,
Shawn

Re: Some questions about SolrJ

Posted by Michael Sokolov <so...@ifactory.com>.

> Shawn, my experience with SolrJ in that configuration (no autoCommit) 
> is that you have control over commits: if you don't issue an explicit 
> commit, it won't happen.  Re lifecycle: we don't use a static 
> instance; rather our app maintains a small pool of 
> CommonsHttpSolrServer instances that we re-use across requests.  I 
> think that will be preferable since I don't think the underlying 
> HttpClient is thread safe?
Hmm, I just checked and actually CommonsHttpSolrServer uses 
MultiThreadedHttpConnectionManager so it should be thread-safe, and OK 
to use a static instance as per documentation.  Sorry for the 
misinformation.

-Mike

Re: Some questions about SolrJ

Posted by Michael Sokolov <so...@ifactory.com>.

On 8/12/2011 4:18 PM, Shawn Heisey wrote:
> On 8/12/2011 1:49 PM, Shawn Heisey wrote:
>> I am sure that I have more questions, but I may be able to answer a 
>> lot of them myself if I can see better examples.
>
> Thought of another question.  My Perl build system uses DIH for all 
> indexing, but with the Java rewrite I am planning to do all actions 
> other than a full index rebuild using the /update handler.  I have 
> autoCommit completely turned off in solrconfig.xml. Do I need to set 
> any parameters to ensure that nothing gets committed until I do a 
> server.commit() myself?
>
> Thanks,
> Shawn
>
Shawn, my experience with SolrJ in that configuration (no autoCommit) is 
that you have control over commits: if you don't issue an explicit 
commit, it won't happen.  Re lifecycle: we don't use a static instance; 
rather our app maintains a small pool of CommonsHttpSolrServer instances 
that we re-use across requests.  I think that will be preferable since I 
don't think the underlying HttpClient is thread safe?

I haven't used CoreAdmin features, nor HTTP POST w/SolrJ, but I do see 
an option to request that the server operate w/multipart post:

  public CommonsHttpSolrServer(URL baseURL, HttpClient client, 
ResponseParser parser, boolean useMultiPartPost)


-Mike

Re: Some questions about SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.

On 8/12/2011 1:49 PM, Shawn Heisey wrote:
> I am sure that I have more questions, but I may be able to answer a 
> lot of them myself if I can see better examples.

Thought of another question.  My Perl build system uses DIH for all 
indexing, but with the Java rewrite I am planning to do all actions 
other than a full index rebuild using the /update handler.  I have 
autoCommit completely turned off in solrconfig.xml. Do I need to set any 
parameters to ensure that nothing gets committed until I do a 
server.commit() myself?

Thanks,
Shawn