You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rallavagu <ra...@gmail.com> on 2016/10/20 16:38:34 UTC
indexing - offline
Solr 5.4.1 cloud with embedded jetty
Looking for some ideas around offline indexing where an independent node
will be indexed offline (not in the cloud) and added to the cloud to
become leader so other cloud nodes will get replicated. Wonder if this
is possible without interrupting the live service. Thanks.
Re: indexing - offline
Posted by Erick Erickson <er...@gmail.com>.
bq: So, a node is part of the cluster but no collections? How can we
add a node to cloud without active participation?
See the collections API create command, in particular the
createNodeSet. You can specify exactly what Solr instances the
collection is created on so you can have two collections using the
same Zookeeper running on totally different nodes. If you use the
"EMPTY" value here, you can ADDREPLICA to place replicas at the
precise location you want.
A slight variant of Tom's process is instead of deleting the
collection every time, just delete all documents from the "old"
collection once you've made the switch (delete by query on *:*).
Either way works fine, whichever is more comfortable.
The CREATEALIAS command is the one to switch your aliases back and
forth, you use it both to create a new one and to change an existing
one.
Best,
Erick
On Thu, Oct 20, 2016 at 2:29 PM, Rallavagu <ra...@gmail.com> wrote:
> Thanks Evan for quick response.
>
> On 10/20/16 10:19 AM, Tom Evans wrote:
>>
>> On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu <ra...@gmail.com> wrote:
>>>
>>> Solr 5.4.1 cloud with embedded jetty
>>>
>>> Looking for some ideas around offline indexing where an independent node
>>> will be indexed offline (not in the cloud) and added to the cloud to
>>> become
>>> leader so other cloud nodes will get replicated. Wonder if this is
>>> possible
>>> without interrupting the live service. Thanks.
>>
>>
>> How we do this, to reindex collection "foo":
>>
>> 1) First, collection "foo" should be an alias to the real collection,
>> eg "foo_1" aliased to "foo"
>> 2) Have a node "node_i" in the cluster that is used for indexing. It
>> doesn't hold any shards of any collections
>
> So, a node is part of the cluster but no collections? How can we add a node
> to cloud without active participation?
>
>> 3) Use collections API to create collection "foo_2", with however many
>> shards required, but all placed on "node_i"
>> 4) Index "foo_2" with new data with DIH or direct indexing to "node_1".
>> 5) Use collections API to expand "foo_2" to all the nodes/replicas
>> that it should be on
>
> Could you please point me to documentation on how to do this? I am referring
> to this doc
> https://cwiki.apache.org/confluence/display/solr/Collections+API. But, it
> has many options and honestly not sure which one would be useful in this
> case.
>
> Thanks
>
>
>> 6) Remove "foo_2" from "node_i"
>> 7) Verify contents of "foo_2" are correct
>> 8) Use collections API to change alias for "foo" to "foo_2"
>> 9) Remove "foo_1" collection once happy
>>
>> This avoids indexing overwhelming the performance of the cluster (or
>> any nodes in the cluster that receive queries), and can be performed
>> with zero downtime or config changes on the clients.
>>
>> Cheers
>>
>> Tom
>>
>
Re: indexing - offline
Posted by Rallavagu <ra...@gmail.com>.
Thanks Evan for quick response.
On 10/20/16 10:19 AM, Tom Evans wrote:
> On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu <ra...@gmail.com> wrote:
>> Solr 5.4.1 cloud with embedded jetty
>>
>> Looking for some ideas around offline indexing where an independent node
>> will be indexed offline (not in the cloud) and added to the cloud to become
>> leader so other cloud nodes will get replicated. Wonder if this is possible
>> without interrupting the live service. Thanks.
>
> How we do this, to reindex collection "foo":
>
> 1) First, collection "foo" should be an alias to the real collection,
> eg "foo_1" aliased to "foo"
> 2) Have a node "node_i" in the cluster that is used for indexing. It
> doesn't hold any shards of any collections
So, a node is part of the cluster but no collections? How can we add a
node to cloud without active participation?
> 3) Use collections API to create collection "foo_2", with however many
> shards required, but all placed on "node_i"
> 4) Index "foo_2" with new data with DIH or direct indexing to "node_1".
> 5) Use collections API to expand "foo_2" to all the nodes/replicas
> that it should be on
Could you please point me to documentation on how to do this? I am
referring to this doc
https://cwiki.apache.org/confluence/display/solr/Collections+API. But,
it has many options and honestly not sure which one would be useful in
this case.
Thanks
> 6) Remove "foo_2" from "node_i"
> 7) Verify contents of "foo_2" are correct
> 8) Use collections API to change alias for "foo" to "foo_2"
> 9) Remove "foo_1" collection once happy
>
> This avoids indexing overwhelming the performance of the cluster (or
> any nodes in the cluster that receive queries), and can be performed
> with zero downtime or config changes on the clients.
>
> Cheers
>
> Tom
>
Re: indexing - offline
Posted by Tom Evans <te...@googlemail.com>.
On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu <ra...@gmail.com> wrote:
> Solr 5.4.1 cloud with embedded jetty
>
> Looking for some ideas around offline indexing where an independent node
> will be indexed offline (not in the cloud) and added to the cloud to become
> leader so other cloud nodes will get replicated. Wonder if this is possible
> without interrupting the live service. Thanks.
How we do this, to reindex collection "foo":
1) First, collection "foo" should be an alias to the real collection,
eg "foo_1" aliased to "foo"
2) Have a node "node_i" in the cluster that is used for indexing. It
doesn't hold any shards of any collections
3) Use collections API to create collection "foo_2", with however many
shards required, but all placed on "node_i"
4) Index "foo_2" with new data with DIH or direct indexing to "node_1".
5) Use collections API to expand "foo_2" to all the nodes/replicas
that it should be on
6) Remove "foo_2" from "node_i"
7) Verify contents of "foo_2" are correct
8) Use collections API to change alias for "foo" to "foo_2"
9) Remove "foo_1" collection once happy
This avoids indexing overwhelming the performance of the cluster (or
any nodes in the cluster that receive queries), and can be performed
with zero downtime or config changes on the clients.
Cheers
Tom
Re: (solrcloud) Importing documents into "implicit" router
Posted by ha...@yahoo.com.INVALID.
Which link are you talking about?
On Friday, October 21, 2016 8:09 PM, Customer <ma...@gmail.com> wrote:
Useless shit which should be deleted from the Internet, because this
confuses people instead of helping them.
On 21/10/16 09:46, hairymcclarey@yahoo.com.INVALID wrote:
> Couple more good links for this:
> https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/
>
> and
> http://stackoverflow.com/questions/15678142/how-to-add-shards-dynamically-to-collection-in-solr
>
> (see Jay's answer about implicit routers - it's a better explanation than the docs in my view!)
>
>
>
>
>
> On Thursday, October 20, 2016 8:09 PM, Customer <ma...@gmail.com> wrote:
>
>
> Hey,
>
> I hope you all are doing well..
>
> I got a router with "router.name=implicit" with couple of shards (lets
> call them shardA and shardB) and got a mysql table ready to import for
> testing purposes. So for example I want to load half of the data to
> shardA and the rest - to the shardB. Question is - how I can do that ? I
> thought this is something to add to the RESTful call when doing import
> for example like curl -m 99999
> "http://localhost:8983/solr/testIMPLICIT2/dataimport?=command=full-import&implicit=shardA
> , but looks like I was wrong.
>
> Thanks
>
>
>
Re: (solrcloud) Importing documents into "implicit" router
Posted by Customer <ma...@gmail.com>.
Useless shit which should be deleted from the Internet, because this
confuses people instead of helping them.
On 21/10/16 09:46, hairymcclarey@yahoo.com.INVALID wrote:
> Couple more good links for this:
> https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/
>
> and
> http://stackoverflow.com/questions/15678142/how-to-add-shards-dynamically-to-collection-in-solr
>
> (see Jay's answer about implicit routers - it's a better explanation than the docs in my view!)
>
>
>
>
>
> On Thursday, October 20, 2016 8:09 PM, Customer <ma...@gmail.com> wrote:
>
>
> Hey,
>
> I hope you all are doing well..
>
> I got a router with "router.name=implicit" with couple of shards (lets
> call them shardA and shardB) and got a mysql table ready to import for
> testing purposes. So for example I want to load half of the data to
> shardA and the rest - to the shardB. Question is - how I can do that ? I
> thought this is something to add to the RESTful call when doing import
> for example like curl -m 99999
> "http://localhost:8983/solr/testIMPLICIT2/dataimport?=command=full-import&implicit=shardA
> , but looks like I was wrong.
>
> Thanks
>
>
>
Re: (solrcloud) Importing documents into "implicit" router
Posted by ha...@yahoo.com.INVALID.
Couple more good links for this:
https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/
and
http://stackoverflow.com/questions/15678142/how-to-add-shards-dynamically-to-collection-in-solr
(see Jay's answer about implicit routers - it's a better explanation than the docs in my view!)
On Thursday, October 20, 2016 8:09 PM, Customer <ma...@gmail.com> wrote:
Hey,
I hope you all are doing well..
I got a router with "router.name=implicit" with couple of shards (lets
call them shardA and shardB) and got a mysql table ready to import for
testing purposes. So for example I want to load half of the data to
shardA and the rest - to the shardB. Question is - how I can do that ? I
thought this is something to add to the RESTful call when doing import
for example like curl -m 99999
"http://localhost:8983/solr/testIMPLICIT2/dataimport?=command=full-import&implicit=shardA
, but looks like I was wrong.
Thanks
Re: (solrcloud) Importing documents into "implicit" router
Posted by Customer <ma...@gmail.com>.
Thanks John. I got it sorted, but that part you pointed still looks
confusing. Imho it should be "You could also use the _route_ parameter
to name a specific shard*when ingesting documents, so Solrcloud will
route your document to specific shard.*"
Cheers.
On 20/10/16 19:14, John Bickerstaff wrote:
> more specifically, this bit from that page seems like it might be of
> interest:
>
> If you created the collection and defined the "implicit" router at the time
> of creation, you can additionally define a router.field parameter to use a
> field from each document to identify a shard where the document belongs. If
> the field specified is missing in the document, however, the document will
> be rejected. You could also use the _route_ parameter to name a specific
> shard.
>
> On Thu, Oct 20, 2016 at 12:12 PM, John Bickerstaff <john@johnbickerstaff.com
>> wrote:
>> This may help? https://cwiki.apache.org/confluence/display/solr/
>> Shards+and+Indexing+Data+in+SolrCloud
>>
>> On Thu, Oct 20, 2016 at 12:09 PM, Customer <ma...@gmail.com>
>> wrote:
>>
>>> Hey,
>>>
>>> I hope you all are doing well..
>>>
>>> I got a router with "router.name=implicit" with couple of shards (lets
>>> call them shardA and shardB) and got a mysql table ready to import for
>>> testing purposes. So for example I want to load half of the data to shardA
>>> and the rest - to the shardB. Question is - how I can do that ? I thought
>>> this is something to add to the RESTful call when doing import for example
>>> like curl -m 99999 "http://localhost:8983/solr/te
>>> stIMPLICIT2/dataimport?=command=full-import&implicit=shardA , but looks
>>> like I was wrong.
>>>
>>> Thanks
>>>
>>
Re: (solrcloud) Importing documents into "implicit" router
Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
more specifically, this bit from that page seems like it might be of
interest:
If you created the collection and defined the "implicit" router at the time
of creation, you can additionally define a router.field parameter to use a
field from each document to identify a shard where the document belongs. If
the field specified is missing in the document, however, the document will
be rejected. You could also use the _route_ parameter to name a specific
shard.
On Thu, Oct 20, 2016 at 12:12 PM, John Bickerstaff <john@johnbickerstaff.com
> wrote:
> This may help? https://cwiki.apache.org/confluence/display/solr/
> Shards+and+Indexing+Data+in+SolrCloud
>
> On Thu, Oct 20, 2016 at 12:09 PM, Customer <ma...@gmail.com>
> wrote:
>
>> Hey,
>>
>> I hope you all are doing well..
>>
>> I got a router with "router.name=implicit" with couple of shards (lets
>> call them shardA and shardB) and got a mysql table ready to import for
>> testing purposes. So for example I want to load half of the data to shardA
>> and the rest - to the shardB. Question is - how I can do that ? I thought
>> this is something to add to the RESTful call when doing import for example
>> like curl -m 99999 "http://localhost:8983/solr/te
>> stIMPLICIT2/dataimport?=command=full-import&implicit=shardA , but looks
>> like I was wrong.
>>
>> Thanks
>>
>
>
Re: (solrcloud) Importing documents into "implicit" router
Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
This may help?
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
On Thu, Oct 20, 2016 at 12:09 PM, Customer <ma...@gmail.com>
wrote:
> Hey,
>
> I hope you all are doing well..
>
> I got a router with "router.name=implicit" with couple of shards (lets
> call them shardA and shardB) and got a mysql table ready to import for
> testing purposes. So for example I want to load half of the data to shardA
> and the rest - to the shardB. Question is - how I can do that ? I thought
> this is something to add to the RESTful call when doing import for example
> like curl -m 99999 "http://localhost:8983/solr/te
> stIMPLICIT2/dataimport?=command=full-import&implicit=shardA , but looks
> like I was wrong.
>
> Thanks
>
(solrcloud) Importing documents into "implicit" router
Posted by Customer <ma...@gmail.com>.
Hey,
I hope you all are doing well..
I got a router with "router.name=implicit" with couple of shards (lets
call them shardA and shardB) and got a mysql table ready to import for
testing purposes. So for example I want to load half of the data to
shardA and the rest - to the shardB. Question is - how I can do that ? I
thought this is something to add to the RESTful call when doing import
for example like curl -m 99999
"http://localhost:8983/solr/testIMPLICIT2/dataimport?=command=full-import&implicit=shardA
, but looks like I was wrong.
Thanks