You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2011/08/12 21:49:37 UTC
Some questions about SolrJ
I currently have a build system for my Solr index written in Perl. I am
in the process of rewriting it in Java. I've reached the part of the
project where I'm using SolrJ, and I have a bunch of questions. All of
the SolrJ examples I can find are too simple to answer them.
A note before I launch into the questions. The wiki page for SolrJ says
that a static instance of CommonsHttpSolrServer is recommended, but NONE
of the examples I have been able to find actually use it that way. I've
since learned that our webapp is creating a new object for every query.
I've brought it to the attention of our development team, they'll be
fixing it.
1) I can't find any examples of using CoreAdmin with SolrJ. There seems
to be a general lack of examples of doing anything complicated at all.
Can anyone point me at comprehensive and detailed examples of using
SolrJ that do everything in accordance with SolrJ recommendations?
2) When constructing and using HTTP requests that you make yourself, you
can use a POST request to issue a query. I use this method in my Perl
build system to check for the existence of a large quantity of
documents, and if any of them do exist, I use the same query to delete
those documents with another POST request. Can I do the same thing with
SolrJ, or is it limited to queries using GET requests only?
3) I'll need to access CoreAdmin as well as individual cores for
updates, queries, etc. The former uses a /solr/ URL, the latter
/solr/corename/. Will I need two CommonsHttpSolrServer instances to do
this, or is there a way to specify a core through a parameter?
I am sure that I have more questions, but I may be able to answer a lot
of them myself if I can see better examples.
Thanks,
Shawn
Re: Some questions about SolrJ
Posted by Erick Erickson <er...@gmail.com>.
About updating the Wiki, just create your login and have at it. Anything
people think is wrong, they can edit <G>....
Best
Erick
On Sun, Aug 14, 2011 at 3:39 PM, Shawn Heisey <so...@elyograg.org> wrote:
> On 8/13/2011 9:59 AM, Michael Sokolov wrote:
>>
>>> Shawn, my experience with SolrJ in that configuration (no autoCommit) is
>>> that you have control over commits: if you don't issue an explicit commit,
>>> it won't happen. Re lifecycle: we don't use a static instance; rather our
>>> app maintains a small pool of CommonsHttpSolrServer instances that we re-use
>>> across requests. I think that will be preferable since I don't think the
>>> underlying HttpClient is thread safe?
>>
>> Hmm, I just checked and actually CommonsHttpSolrServer uses
>> MultiThreadedHttpConnectionManager so it should be thread-safe, and OK to
>> use a static instance as per documentation. Sorry for the misinformation.
>
> Thanks for the help!
>
> I've been able to muddle my way through part of my implementation on my own.
> There doesn't seem to be any way to point to the base /solr/ url and then
> ask SolrJ to add a core when creating requests. I did see that you can set
> the URL for the server object after it's created, but if I ever make this
> thing multithreaded, I fear doing so will cause problems. I'm going with
> one server object (solrServer) for CoreAdmin and another object (solrCore)
> for requests against the core.
>
> This new build system has an object representing one complete index, which
> uses a container of seven objects representing each of the shards. Each of
> the shard objects has two objects representing a build core and a live core.
> Each of the core objects contains the solrServer and solrCore already
> mentioned. Since I have two complete indexes, this means that the final
> product will initialize 56 server objects.
>
> I couldn't use static server objects as recommended by the docs, because I
> have so many instances that all need different URLs. They are private class
> members that get created only once, so I think it will be OK. A static
> object would be a good idea for a search application, because it likely only
> needs to deal with one URL. Our webapp developers told me that they will be
> putting the server object into a bean in the application context.
>
> When I've got everything done and debugged, I will use what I've learned to
> augment the SolrJ wiki page. Who is the best community person to coordinate
> with on that to make sure I put up good information?
>
> Thanks,
> Shawn
>
>
Re: Some questions about SolrJ
Posted by Shawn Heisey <so...@elyograg.org>.
On 8/13/2011 9:59 AM, Michael Sokolov wrote:
>
>> Shawn, my experience with SolrJ in that configuration (no autoCommit)
>> is that you have control over commits: if you don't issue an explicit
>> commit, it won't happen. Re lifecycle: we don't use a static
>> instance; rather our app maintains a small pool of
>> CommonsHttpSolrServer instances that we re-use across requests. I
>> think that will be preferable since I don't think the underlying
>> HttpClient is thread safe?
> Hmm, I just checked and actually CommonsHttpSolrServer uses
> MultiThreadedHttpConnectionManager so it should be thread-safe, and OK
> to use a static instance as per documentation. Sorry for the
> misinformation.
Thanks for the help!
I've been able to muddle my way through part of my implementation on my
own. There doesn't seem to be any way to point to the base /solr/ url
and then ask SolrJ to add a core when creating requests. I did see that
you can set the URL for the server object after it's created, but if I
ever make this thing multithreaded, I fear doing so will cause
problems. I'm going with one server object (solrServer) for CoreAdmin
and another object (solrCore) for requests against the core.
This new build system has an object representing one complete index,
which uses a container of seven objects representing each of the
shards. Each of the shard objects has two objects representing a build
core and a live core. Each of the core objects contains the solrServer
and solrCore already mentioned. Since I have two complete indexes, this
means that the final product will initialize 56 server objects.
I couldn't use static server objects as recommended by the docs, because
I have so many instances that all need different URLs. They are private
class members that get created only once, so I think it will be OK. A
static object would be a good idea for a search application, because it
likely only needs to deal with one URL. Our webapp developers told me
that they will be putting the server object into a bean in the
application context.
When I've got everything done and debugged, I will use what I've learned
to augment the SolrJ wiki page. Who is the best community person to
coordinate with on that to make sure I put up good information?
Thanks,
Shawn
Re: Some questions about SolrJ
Posted by Michael Sokolov <so...@ifactory.com>.
> Shawn, my experience with SolrJ in that configuration (no autoCommit)
> is that you have control over commits: if you don't issue an explicit
> commit, it won't happen. Re lifecycle: we don't use a static
> instance; rather our app maintains a small pool of
> CommonsHttpSolrServer instances that we re-use across requests. I
> think that will be preferable since I don't think the underlying
> HttpClient is thread safe?
Hmm, I just checked and actually CommonsHttpSolrServer uses
MultiThreadedHttpConnectionManager so it should be thread-safe, and OK
to use a static instance as per documentation. Sorry for the
misinformation.
-Mike
Re: Some questions about SolrJ
Posted by Michael Sokolov <so...@ifactory.com>.
On 8/12/2011 4:18 PM, Shawn Heisey wrote:
> On 8/12/2011 1:49 PM, Shawn Heisey wrote:
>> I am sure that I have more questions, but I may be able to answer a
>> lot of them myself if I can see better examples.
>
> Thought of another question. My Perl build system uses DIH for all
> indexing, but with the Java rewrite I am planning to do all actions
> other than a full index rebuild using the /update handler. I have
> autoCommit completely turned off in solrconfig.xml. Do I need to set
> any parameters to ensure that nothing gets committed until I do a
> server.commit() myself?
>
> Thanks,
> Shawn
>
Shawn, my experience with SolrJ in that configuration (no autoCommit) is
that you have control over commits: if you don't issue an explicit
commit, it won't happen. Re lifecycle: we don't use a static instance;
rather our app maintains a small pool of CommonsHttpSolrServer instances
that we re-use across requests. I think that will be preferable since I
don't think the underlying HttpClient is thread safe?
I haven't used CoreAdmin features, nor HTTP POST w/SolrJ, but I do see
an option to request that the server operate w/multipart post:
public CommonsHttpSolrServer(URL baseURL, HttpClient client,
ResponseParser parser, boolean useMultiPartPost)
-Mike
Re: Some questions about SolrJ
Posted by Shawn Heisey <so...@elyograg.org>.
On 8/12/2011 1:49 PM, Shawn Heisey wrote:
> I am sure that I have more questions, but I may be able to answer a
> lot of them myself if I can see better examples.
Thought of another question. My Perl build system uses DIH for all
indexing, but with the Java rewrite I am planning to do all actions
other than a full index rebuild using the /update handler. I have
autoCommit completely turned off in solrconfig.xml. Do I need to set any
parameters to ensure that nothing gets committed until I do a
server.commit() myself?
Thanks,
Shawn