You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2012/12/14 21:09:06 UTC

Solrcloud and Node.js

Hello!

I've always used Java as the backend language to program search modules,
and I know that CloudSolrServer implementation is the way to interact with
SolrCloud. However, I'm starting to love Node.js and I was wondering if
there exists the posibility to launch queries to a SolrCloud with the "old
fashioned" sharding syntax.

Thank you in advance!

Best regards.

Re: Solrcloud and Node.js

Posted by Per Steffensen <st...@designware.dk>.

Luis Cappa Banda skrev:
> Thanks a lot, Per. Now I understand the whole scenario. One last question:
> I've been searching trying to find some kind of request handler that
> retrieves cluster status information, but no luck. I know that there exists
> a JSON called clusterstate.json, but I don't know the way to get it in raw
> JSON format.
If you want the clusterstate in raw JSON format, I believe there is 
currently no other way than go fetch it youself from ZK. Or maybe 
something in the admin-console /zookeeper will help you.
>  Do you know how to get it status? Any request handler or Solr
> query? Maybe checking directly from Zookeeper?
>   
Yes, if you want it in raw JSON format. If you want the "information" 
parsed as a java object hierarchy you can access through ClusterState 
object. The best way to get a ClusterState (that keeps itself up to date 
with changing states) is probably to use the ZkStateReader:
    ZkStateReader zk = new ZkStateReader(<ZK-connection-string>, 
<zk-connection-timeout>, <zk-client-timeout>);
    zk.createClusterStateWatchersAndUpdate();
Then whenever you want a updated "picture" of the cluster state:
    zk.getClusterState();
You can also use a CloudSolrServer which carries a ZkStateReader if you 
are already using that one. But I guess not since it didnt sound like 
you would try the node-java bridge to be able to use SolrJ stuff in node.js
> Best regards,
>
> - Luis Cappa.

Re: Solrcloud and Node.js

Posted by Mark Miller <ma...@gmail.com>.

There is a /zookeeper servlet that the admin UI uses for the Cloud tab. I don't know much about it, I think Ryan wrote it.

The other option is to talk to zk directly.

I also plan on adding an admin handler for ZooKeeper at some point.

- Mark

On Dec 15, 2012, at 12:33 PM, Luis Cappa Banda <lu...@gmail.com> wrote:

> Thanks a lot, Per. Now I understand the whole scenario. One last question:
> I've been searching trying to find some kind of request handler that
> retrieves cluster status information, but no luck. I know that there exists
> a JSON called clusterstate.json, but I don't know the way to get it in raw
> JSON format. Do you know how to get it status? Any request handler or Solr
> query? Maybe checking directly from Zookeeper?
> 
> Best regards,
> 
> - Luis Cappa.
> 
> 2012/12/15 Per Steffensen <st...@designware.dk>
> 
>> Luis Cappa Banda skrev:
>> 
>> Do you know if SolrCloud replica shards have 100% the same data as the
>>> leader ones every time? Probably wen synchronizing with leaders there
>>> exists a delay, so executing queries to replicas won't be a good idea.
>>> 
>>> 
>> As long as the replica is in state "active" it will be 100% up to date
>> with leader - updates goes to leader, but it dispatches simular request to
>> replica and does not respond (positively) to your update-request before it
>> has successfully received positive answers from replica (and of course also
>> locally stored the update successfully). If replica is in state
>> "recovering" or "down" or somthing it is (potentially) not up to date with
>> leader.
>> 
>> Remember that even though updates are made on both leader and replica
>> synchronously it might not be available for (non-real-time) search on
>> leader and replica at exactly the same time, if you do not also make sure
>> to commit as part of you update. If you update alot you probably do not
>> want to commit every time. If you use (soft) auto-commit on the leader and
>> replica it will be possible that leader and replica does not respond
>> equally to the same request at the same time - but the leader can just as
>> well as the replica be the one that is "behind". If you use low values for
>> (soft) auto-commit in practice leader and replica will have the same
>> documents available for search at any time.
>> 
>> Thank you very much in advance.
>>> 
>>> Best regards,
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> 
> - Luis Cappa

Re: Solrcloud and Node.js

Posted by Luis Cappa Banda <lu...@gmail.com>.

Thanks a lot, Per. Now I understand the whole scenario. One last question:
I've been searching trying to find some kind of request handler that
retrieves cluster status information, but no luck. I know that there exists
a JSON called clusterstate.json, but I don't know the way to get it in raw
JSON format. Do you know how to get it status? Any request handler or Solr
query? Maybe checking directly from Zookeeper?

Best regards,

- Luis Cappa.

2012/12/15 Per Steffensen <st...@designware.dk>

> Luis Cappa Banda skrev:
>
>  Do you know if SolrCloud replica shards have 100% the same data as the
>> leader ones every time? Probably wen synchronizing with leaders there
>> exists a delay, so executing queries to replicas won't be a good idea.
>>
>>
> As long as the replica is in state "active" it will be 100% up to date
> with leader - updates goes to leader, but it dispatches simular request to
> replica and does not respond (positively) to your update-request before it
> has successfully received positive answers from replica (and of course also
> locally stored the update successfully). If replica is in state
> "recovering" or "down" or somthing it is (potentially) not up to date with
> leader.
>
> Remember that even though updates are made on both leader and replica
> synchronously it might not be available for (non-real-time) search on
> leader and replica at exactly the same time, if you do not also make sure
> to commit as part of you update. If you update alot you probably do not
> want to commit every time. If you use (soft) auto-commit on the leader and
> replica it will be possible that leader and replica does not respond
> equally to the same request at the same time - but the leader can just as
> well as the replica be the one that is "behind". If you use low values for
> (soft) auto-commit in practice leader and replica will have the same
> documents available for search at any time.
>
>  Thank you very much in advance.
>>
>> Best regards,
>>
>>
>>
>>
>
>


-- 

- Luis Cappa

Re: Solrcloud and Node.js

Posted by Per Steffensen <st...@designware.dk>.

Luis Cappa Banda skrev:
> Do you know if SolrCloud replica shards have 100% the same data as the
> leader ones every time? Probably wen synchronizing with leaders there
> exists a delay, so executing queries to replicas won't be a good idea.
>   
As long as the replica is in state "active" it will be 100% up to date 
with leader - updates goes to leader, but it dispatches simular request 
to replica and does not respond (positively) to your update-request 
before it has successfully received positive answers from replica (and 
of course also locally stored the update successfully). If replica is in 
state "recovering" or "down" or somthing it is (potentially) not up to 
date with leader.

Remember that even though updates are made on both leader and replica 
synchronously it might not be available for (non-real-time) search on 
leader and replica at exactly the same time, if you do not also make 
sure to commit as part of you update. If you update alot you probably do 
not want to commit every time. If you use (soft) auto-commit on the 
leader and replica it will be possible that leader and replica does not 
respond equally to the same request at the same time - but the leader 
can just as well as the replica be the one that is "behind". If you use 
low values for (soft) auto-commit in practice leader and replica will 
have the same documents available for search at any time.
> Thank you very much in advance.
>
> Best regards,
>
>
>

Re: Solrcloud and Node.js

Posted by Luis Cappa Banda <lu...@gmail.com>.

Hello, Per.

Thanks for your answer! I jave worked a lot with SolrJ and in the last two
months also with the new SolrJ 4.0 and specifically with Zookeeper and
CloudSolrServer implementation. I've developed a search engine wrapper that
dispatches queries to SolrCloud using a CloudSolrServer pool. The whole
WebApp is built in Java, and works fine, but some months ago I met Node.js
and question myself: "hey, Node.js is awesome dispatching queries with up
to 250K requests in a single machine. Why not to try to to work with Solr?"

And then I started to built a Node.js Solr client to execute queries just
into one Solr server instance. You can't imagine how damn good is that
combination. Just a simple example: Node.js with Express.js back-end
dispatching queries to just one Solr server instance with a simple and very
basic Solr.js client that I've developed. The server host has just* 1 core*and
*1GB RAM*. I indexed *3 Million* of documents and test the whole system
with an ab test. *Result*: ab -c 1000 -n 100000 + CPU: 24% RAM: 768M. So
good.

However, the power of Solr 4.0 resides in SolrCloud, and I would like to
build an smarter Node.js client that uses sharding queries over collection
shards. I think that I don't have enougth time to build a complete
CloudSolrServer Node.js version, and compiling Java code into Javascript
one doesn't sound good in terms of performance and best practices.

Maybe if I can access someway frequently to Zookeeper data status I can
update this Solr.js client status to execute queries to those collections
that have shards alive, balancing between leader and replica shards. It
won't work as smart as CloudSolrServer.java, but not as dumb as executing
distributed queries without any Zookeeper cluster status knowledge.

Do you know if SolrCloud replica shards have 100% the same data as the
leader ones every time? Probably wen synchronizing with leaders there
exists a delay, so executing queries to replicas won't be a good idea.

Thank you very much in advance.

Best regards,


2012/12/15 Per Steffensen <st...@designware.dk>

> As Mark mentioned Solr(Cloud) can be accessed through HTTP and return e.g.
> JSON which should be easy to handle in a javascript. But the client-part
> (SolrJ) of Solr is not just a dumb client interface - it provides a lot of
> client-side functionality, e.g. some intelligent decision making based on
> ZK state. I would probably try to see if I could make SolrJ and in
> particular CloudSolrServer (yes its a client, even though the name does not
> indicate) work. Maybe you will successful using one of:
> * https://github.com/**nearinfinity/node-java<https://github.com/nearinfinity/node-java>to embed CloudSolrServer in node.js
> * use GWT to compile CloudSolrServer to javascript (I would imagine it
> will be hard to make it work though)
>
> Regards, Per Steffensen
>
> Luis Cappa Banda skrev:
>
>  Hello!
>>
>> I've always used Java as the backend language to program search modules,
>> and I know that CloudSolrServer implementation is the way to interact with
>> SolrCloud. However, I'm starting to love Node.js and I was wondering if
>> there exists the posibility to launch queries to a SolrCloud with the "old
>> fashioned" sharding syntax.
>>
>> Thank you in advance!
>>
>> Best regards.
>>
>>
>>
>
>


-- 

- Luis Cappa

Re: Solrcloud and Node.js

Posted by Per Steffensen <st...@designware.dk>.

As Mark mentioned Solr(Cloud) can be accessed through HTTP and return 
e.g. JSON which should be easy to handle in a javascript. But the 
client-part (SolrJ) of Solr is not just a dumb client interface - it 
provides a lot of client-side functionality, e.g. some intelligent 
decision making based on ZK state. I would probably try to see if I 
could make SolrJ and in particular CloudSolrServer (yes its a client, 
even though the name does not indicate) work. Maybe you will successful 
using one of:
* https://github.com/nearinfinity/node-java to embed CloudSolrServer in 
node.js
* use GWT to compile CloudSolrServer to javascript (I would imagine it 
will be hard to make it work though)

Regards, Per Steffensen

Luis Cappa Banda skrev:
> Hello!
>
> I've always used Java as the backend language to program search modules,
> and I know that CloudSolrServer implementation is the way to interact with
> SolrCloud. However, I'm starting to love Node.js and I was wondering if
> there exists the posibility to launch queries to a SolrCloud with the "old
> fashioned" sharding syntax.
>
> Thank you in advance!
>
> Best regards.
>
>

Re: Solrcloud and Node.js

Posted by Luis Cappa Banda <lu...@gmail.com>.

I think that Node.js is extremely powerful for developing REST API very
light and simple modules, so combining it with Solr sounds good, that´s why
I´m obsessed to combine them.

So then with an example of numShards=2 SolrCloud is posible to execute
queries like:

http://host1:8000/solr/collection1/select?shards=host2:8000/solr/collection1,host3:8000/solr/collection1,host4:8000/solr/collection1&indent=true&q=title:(Indiana
Jones)

Where al those are leaders.

What I was thinking is to use just Node.js to dispatch queries without (at
least at first - It´s just a personal test) any balancer right now.  Maybe
it should be interesting to return with each query response the cluster
state to make "smarter" some alternative Solr clients as the one I want to
try.

That will make posible to create some kind of CloudSolrServer Node.js
object and update it´s status with each cluster state response (embeded in
each query response). That CloudSolrServer object will just to store the
shards status and to map queries just to those that are "alive" and execute
to the leaders.

I don´t know how is SolrCloud implemented but, does it also executes
queries to replicas? It sounds reasonable for balancing, but I´m not sure
if replicas are 100% data-synchronized (and always) with their leaders, so
maybe it won´t be a good idea.

2012/12/14 Mark Miller <ma...@gmail.com>

> Yes, you can access SolrCloud in any std way you can access Solr.
>
> The main difference when using a client that does not know how to talk to
> ZooKeeper about the cluster state:
>
> You have to specify a particular machines address or setup a load balancer
> when using a 'dumb client.
>
> A dumb client will not know about additions or removals from the cluster -
> if you are using a load balancer you will have to update it with the new
> state.
>
> A dumb client won't be able to optimize some updates to leaders.
>
> It's still a perfectly reasonable option to not use a 'smart' client
> though.
>
> - Mark
>
> On Dec 14, 2012, at 3:09 PM, Luis Cappa Banda <lu...@gmail.com> wrote:
>
> > Hello!
> >
> > I've always used Java as the backend language to program search modules,
> > and I know that CloudSolrServer implementation is the way to interact
> with
> > SolrCloud. However, I'm starting to love Node.js and I was wondering if
> > there exists the posibility to launch queries to a SolrCloud with the
> "old
> > fashioned" sharding syntax.
> >
> > Thank you in advance!
> >
> > Best regards.
>
>

-- 

- Luis Cappa

Re: Solrcloud and Node.js

Posted by Mark Miller <ma...@gmail.com>.

Yes, you can access SolrCloud in any std way you can access Solr.

The main difference when using a client that does not know how to talk to ZooKeeper about the cluster state:

You have to specify a particular machines address or setup a load balancer when using a 'dumb client.

A dumb client will not know about additions or removals from the cluster - if you are using a load balancer you will have to update it with the new state.

A dumb client won't be able to optimize some updates to leaders.

It's still a perfectly reasonable option to not use a 'smart' client though.

- Mark

On Dec 14, 2012, at 3:09 PM, Luis Cappa Banda <lu...@gmail.com> wrote:

> Hello!
> 
> I've always used Java as the backend language to program search modules,
> and I know that CloudSolrServer implementation is the way to interact with
> SolrCloud. However, I'm starting to love Node.js and I was wondering if
> there exists the posibility to launch queries to a SolrCloud with the "old
> fashioned" sharding syntax.
> 
> Thank you in advance!
> 
> Best regards.