You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by neerajp <ne...@yahoo.com> on 2014/02/14 11:57:35 UTC

update in SolrCloud through C++ client

Hello All,
I am using Solr for indexing my data. My client is in C++. So I make Curl
request to Solr server for indexing.
Now, I want to use indexing in SolrCloud mode using ZooKeeper for HA.  I
read the wiki link of SolrCloud (http://wiki.apache.org/solr/SolrCloud). 

What I understand from wiki that we should always check solr instance
status(up & running) in solrCloud before making an update request. Can I not
send update request to zookeeper and let the zookeeper forwards it to
appropriate replica/leader ? In the later case I need not to worry which
servers are up and running before making indexing request. 




--
View this message in context: http://lucene.472066.n3.nabble.com/update-in-SolrCloud-through-C-client-tp4117340.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: update in SolrCloud through C++ client

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.
If only availability is your concern, you can always keep a list of servers
to which your C++ clients will send requests, and round robin amongst them.
If one of the servers go down, you will either not be able to reach it or
get a 500+ error in the HTTP response, you can take it out of circulation
(and probably retry in the background with some kind of a ping every minute
or so to these down servers to ascertain if they have come back and then
add them to the list). This is something SolrJ does currently. This doesn't
technically need any Zookeeper interaction.

The biggest benefit that SolrJ provides (since 4.6 I think) though is that
it finds the shard leader to send an update to using ZK and saves a hop.
You can technically do this by retrieving and listening to updates using a
C++ ZK client (available) and doing what SolrJ currently does. This would
be good, the only drawback though, apart from the effort, is that
improvements are still happening in the area of managing clusters and how
its state is saved with ZK. These changes might not break your code, but at
the same time you might not be able to take advantage of them without
additional effort.

An alternative approach is to link SolrJ into your C++ client using JNI.
This has the added benefit of using the Javabin format for requests which
would have some performance benefits.

In short, it comes down to what performance requirements are. If indexing
speed and throughput is not that big a deal, just go with having a list of
servers and load balancing amongst the active ones. I would suggest you try
this anyway before second guessing that you do need the optimization.

If not, I would probably try the JNI route,  and if that fails, using a C
ZK client to read the cluster state and using that knowledge to decide
where to send requests.
On 14 Feb 2014 10:58, "neerajp" <ne...@yahoo.com> wrote:

> Hello All,
> I am using Solr for indexing my data. My client is in C++. So I make Curl
> request to Solr server for indexing.
> Now, I want to use indexing in SolrCloud mode using ZooKeeper for HA.  I
> read the wiki link of SolrCloud (http://wiki.apache.org/solr/SolrCloud).
>
> What I understand from wiki that we should always check solr instance
> status(up & running) in solrCloud before making an update request. Can I
> not
> send update request to zookeeper and let the zookeeper forwards it to
> appropriate replica/leader ? In the later case I need not to worry which
> servers are up and running before making indexing request.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/update-in-SolrCloud-through-C-client-tp4117340.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: update in SolrCloud through C++ client

Posted by Erick Erickson <er...@gmail.com>.
OK, I'll go waaaaay out on a limb here since I know so
little about the guts of the ZK/Solr interactions on the
theory that if I'm wrong someone will jump in and I'll
remember the data due to being embarrassed.

ZK doesn't really know much about Solr. It knows there
are a bunch of nodes out there running this thing called
"Solr". It knows the state/roles of those nodes mostly
because the nodes _tell_ ZK about themselves. It knows
nodes are up/down because it hasn't heard from them in
a while.

What it doesn't know is what those nodes _do_. To "forwards
it to appropriate replica/leader" code ZK would have to be
able to pull apart a (potentially) many document packet of
SolrInputDocuments, grok that there's this <uniqueKey> field
(which would mean it would have to understand the schema)
understand the routing mechanism to be used to
identify the right leader and forward each doc appropriately.
IOW it would have to implement much of CloudSolrServer.

And even if it did all that (which IMO would be architecturally
_very_ bad), it would be bad for throughput. Now my 3
ZK nodes have to handle _all_ the routing traffic for my 100
node cluster, introducing a potential bottleneck.

I know you're using C++ so the Java version may not apply,
but the CloudSolrServer class hides all of this and _does_ send
the docs to the right leader all without burdening the ZK nodes.
I know there has been a C++ port of SolrJ, but don't
know whether it has been kept up to date with the more recent
SolrJ improvements.

Whew! Occasionally I write these in order to make myself think
about things, what did I mess up? (Mark, Shalin and Noble may
jump all over this, won't be the first time)...

Erick


On Fri, Feb 14, 2014 at 2:57 AM, neerajp <ne...@yahoo.com> wrote:

> Hello All,
> I am using Solr for indexing my data. My client is in C++. So I make Curl
> request to Solr server for indexing.
> Now, I want to use indexing in SolrCloud mode using ZooKeeper for HA.  I
> read the wiki link of SolrCloud (http://wiki.apache.org/solr/SolrCloud).
>
> What I understand from wiki that we should always check solr instance
> status(up & running) in solrCloud before making an update request. Can I
> not
> send update request to zookeeper and let the zookeeper forwards it to
> appropriate replica/leader ? In the later case I need not to worry which
> servers are up and running before making indexing request.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/update-in-SolrCloud-through-C-client-tp4117340.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>