You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Steve <ab...@gmail.com> on 2015/10/06 15:58:40 UTC

indexing data to solrcloud with "implicit" is not distributing across cluster.

I’ve been unable to get solrcloud to distribute data across 4 solr nodes
with the “route.name=implicit”  feature of the collections API.

The nodes are live, and the graphs are green.  All the data (the “Films”
example data) shows up on one node, the node that received the CREATE
command.





My CREATE command is:

curl
http://host-192-168-0-60.openstacklocal:8081/solr/admin/collections?action=CREATE&name=CollectionFilms&replicationFactor=2&router.name=implicit&shards=shard-1,shard-2,shard-3,shard-4&maxShardsPerNode=2&collection.configName=configAlpha



solr version 5.3.1

zookeeper version 3.4.6

indexing with:

   cd /opt/solr/example/films;

    /opt/solr/bin/post -c CollectionFilms -port 8081  films.json





Thanks,

strick

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by hemanth <k....@gmail.com>.
As per my understanding, distrib=false will be added in select query to
restrict the document selection to particular shard. But how should i route
the documents to only particular shard, is still my need.

Thanks
Hemanth



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by Erick Erickson <er...@gmail.com>.
I suspect that when you create your collections, somehow you're not
doing it like you expect.

The red flag is:

I tried creating a collection with compositeId routing which
created shard1,shard2,shard3 , but when I indexed , all the documents went
to one shard only

This simply shouldn't be happening. What is your evidence that all the docs went
to one shard? You can tell by adding &distrib=false to your query and
sending it to
particular core, something like:

solr_server/solr/collection1_shard1_replica1/query?q=*:*&distrib=false.

Best,
Erick

On Mon, Dec 25, 2017 at 4:15 AM, hemanth <k....@gmail.com> wrote:
> Hi Erik,
> Thanks for your reply. I have no issues of using either Implicit or
> Composite routing but I want to insert the documents to a particular shard,
> so that when I want to query the data , I can hit a particular shard, which
> gives me the results in lesser time as it hits only particular shard. So,
> for eg: I am creating a collection with status as Active, Inactive and
> Terminated. Let me think that my data at present is equally distributed ,
> i.e Active 400 records, Inactive 300 records and Terminated also 300
> records. I tried creating a collection with compositeId routing which
> created shard1,shard2,shard3 , but when I indexed , all the documents went
> to one shard only. I also created a collection with Implicit routing
> mechanism with Active,Inactive and Terminated shard with routing key as
> status. When I indexed the documents , again all went to single shard. I
> want to route the documents based on some input value (with out based on the
> hash value of the field , I specified, because both values may always lead
> to same hash value and may point to store in same shard).  So , Please let
> me know, how to route the documents to a particular shard based on composite
> id or implicit mechanism, by using one of the existing field value or
> extracting the content of the field before ! parameter. eg: if my field
> value is "Active!otherfieldvalue" should go to Active shard and if my field
> value is  "Inactive!othercontent" should go to Inactive shard.
>
> Thanks
> Hemanth
>
> -Happy Christmas
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by hemanth <k....@gmail.com>.
Hi Erik,
Thanks for your reply. I have no issues of using either Implicit or
Composite routing but I want to insert the documents to a particular shard,
so that when I want to query the data , I can hit a particular shard, which
gives me the results in lesser time as it hits only particular shard. So,
for eg: I am creating a collection with status as Active, Inactive and
Terminated. Let me think that my data at present is equally distributed ,
i.e Active 400 records, Inactive 300 records and Terminated also 300
records. I tried creating a collection with compositeId routing which
created shard1,shard2,shard3 , but when I indexed , all the documents went
to one shard only. I also created a collection with Implicit routing
mechanism with Active,Inactive and Terminated shard with routing key as
status. When I indexed the documents , again all went to single shard. I
want to route the documents based on some input value (with out based on the
hash value of the field , I specified, because both values may always lead
to same hash value and may point to store in same shard).  So , Please let
me know, how to route the documents to a particular shard based on composite
id or implicit mechanism, by using one of the existing field value or
extracting the content of the field before ! parameter. eg: if my field
value is "Active!otherfieldvalue" should go to Active shard and if my field
value is  "Inactive!othercontent" should go to Inactive shard.

Thanks
Hemanth

-Happy Christmas



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by Erick Erickson <er...@gmail.com>.
You're misinterpreting the docs. _route_ is used to
tell _queries_ where to go, or to route a document
as part of the parameters when you send the doc,
not a field in the doc.

So when you added the _route_ field to the doc, you
didn't have it in the schema in the first place.

So you could add a _route_ field to your schema
and work that way, but then you have to also define
router.field=_route_ when you create the colleciton.
I'd advise instead just specifying router.field=Status
to avoid confusion.

Now, that said I really question whether this is a good
way to set up your collection. I'd just use compositeId
and when you want to restrict searches to one type
or the other add
&fq=Status:Active
or
&fq=Status:Terminated

that way you can't forget to delete the doc from one
shard or the other when the status changes. You won't
have lopsided doc counts on your shards because you
have 10,000,000 active docs and 10 terminated docs.
And whatever ratio you start with, it'll change as the
collection ages.

FWIW,
Erick

On Fri, Dec 15, 2017 at 11:17 AM, hemanth <k....@gmail.com> wrote:
> I created a collection with implicit routing mechanism and my shared names
> are Active and Disabled , these are the values of one of my collection
> field: Status.  But when I am trying to upload the document using Solr UI
> documents section : Upload using JSON format with all the fields including
> field with value for Status as either Terminated or Active. It is going to
> only one default shard. I tried to insert _route_ field with the value as
> "Terminated" and when I try to insert the document , I am getting
>
> *unknown field '_route_' Error from server*. Am I trying in correct way?
> Does the implicit routing works on the hash value of routing field and it
> does not go to the shard based on the value of the routing field?
>
> I want to store the document with status field value : Active to
> myCollectionn_Active shard and document with status field value: Terminated
> to myCollection_Terminated shard automatically based on the value of my
> status field in the document. I used implicit routing while creating
> collection and given shard names as Active,Terminated. Plz help. I am using
> Solr 6.6 version.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by hemanth <k....@gmail.com>.
I created a collection with implicit routing mechanism and my shared names
are Active and Disabled , these are the values of one of my collection
field: Status.  But when I am trying to upload the document using Solr UI
documents section : Upload using JSON format with all the fields including
field with value for Status as either Terminated or Active. It is going to
only one default shard. I tried to insert _route_ field with the value as
"Terminated" and when I try to insert the document , I am getting  

*unknown field '_route_' Error from server*. Am I trying in correct way?
Does the implicit routing works on the hash value of routing field and it
does not go to the shard based on the value of the routing field? 

I want to store the document with status field value : Active to
myCollectionn_Active shard and document with status field value: Terminated
to myCollection_Terminated shard automatically based on the value of my
status field in the document. I used implicit routing while creating
collection and given shard names as Active,Terminated. Plz help. I am using
Solr 6.6 version.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by Erick Erickson <er...@gmail.com>.
Did you try setting the "magic" field _route_ in your docs to the
shard? Something like
doc.addField("_route", "shard1")?

Best,
Erick

On Wed, Jun 15, 2016 at 10:31 AM, nikosmarinos <ni...@hotmail.com> wrote:
> Is it possible to give an example? I want doc1 to be explicitly routed to
> "shard1" of my "implicit" collection and doc2 to "shard4". How can I do
> that?
>
> Creating an implicit collection with one of the example configurations of
> the solr package, defining the "id" field as the router.field (not sure if
> necessary) and indexing id:shard1 id:shard2 id:shard3 takes all documents to
> the same (random) shard.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-to-solrcloud-with-implicit-is-not-distributing-across-cluster-tp4232956p4282428.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by nikosmarinos <ni...@hotmail.com>.
Is it possible to give an example? I want doc1 to be explicitly routed to
"shard1" of my "implicit" collection and doc2 to "shard4". How can I do
that? 

Creating an implicit collection with one of the example configurations of
the solr package, defining the "id" field as the router.field (not sure if
necessary) and indexing id:shard1 id:shard2 id:shard3 takes all documents to
the same (random) shard.



--
View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-to-solrcloud-with-implicit-is-not-distributing-across-cluster-tp4232956p4282428.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by Chris Hostetter <ho...@fucit.org>.
: The documentation int the Collections API says  "The value can be ...
: *implicit*, which uses an internal default hash".
: I think most people would assume the "hash" would be used to route the
: data.
: Meanwhile the description of CompositID in the "Document Routing" section
: only discusses how modify your document IDs, which I did not want to do.

Hmmm... I'm guessing you are looking at PDF copy of the ref guide?  

Pretty sure that was a mistake that's already been fixed.  At the moment 
the Collections API CREATE command says...

https://cwiki.apache.org/confluence/display/solr/Collections+API

"The 'implicit' router does not automatically route documents to different 
shards.  Whichever shard you indicate on the indexing request (or within 
each document) will be used as the destination for those documents"



And the details on document routing say...

https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-DocumentRouting

If you created the collection and defined the "implicit" router at the 
time of creation, you can additionally define a router.field parameter to 
use a field from each document to identify a shard where the document 
belongs. If the field specified is missing in the document, however, the 
document will be rejected. You could also use the _route_ parameter to 
name a specific shard.


...which i believe is all accurate.



-Hoss
http://www.lucidworks.com/

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/6/2015 10:02 AM, Steve wrote:
> Thanks Shawn, that fixed it !
>
> The documentation int the Collections API says  "The value can be ...
> *implicit*, which uses an internal default hash".

Thank you for pointing out this error in the documentation.  I did not
know it was there.  I have updated the online Reference Guide so it is
correct.  Hopefully this will help clear up any confusion!

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateaCollection

Thanks,
Shawn


Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by Steve <ab...@gmail.com>.
Thanks Shawn, that fixed it !

The documentation int the Collections API says  "The value can be ...
*implicit*, which uses an internal default hash".
I think most people would assume the "hash" would be used to route the
data.
Meanwhile the description of CompositID in the "Document Routing" section
only discusses how modify your document IDs, which I did not want to do.

thanks again,
.strick



On Tue, Oct 6, 2015 at 8:15 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/6/2015 7:58 AM, Steve wrote:
> > I’ve been unable to get solrcloud to distribute data across 4 solr nodes
> > with the “route.name=implicit”  feature of the collections API.
> >
> > The nodes are live, and the graphs are green.  All the data (the “Films”
> > example data) shows up on one node, the node that received the CREATE
> > command.
>
> A better name for the implicit router is "manual."  The implicit router
> doesn't actually route.  It assumes that you know what you are doing and
> have sent the request to the shard where you want it to be indexed.
>
> You want the compositeId router.
>
> Even though the name "implicit" makes sense in the context of Solr
> *code*, it is a confusing name when it comes to user expectations.
> You're not the first one to be confused by this, which is why I opened
> this issue:
>
> https://issues.apache.org/jira/browse/SOLR-6630
>
> Thanks,
> Shawn
>
>

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/6/2015 7:58 AM, Steve wrote:
> I’ve been unable to get solrcloud to distribute data across 4 solr nodes
> with the “route.name=implicit”  feature of the collections API.
> 
> The nodes are live, and the graphs are green.  All the data (the “Films”
> example data) shows up on one node, the node that received the CREATE
> command.

A better name for the implicit router is "manual."  The implicit router
doesn't actually route.  It assumes that you know what you are doing and
have sent the request to the shard where you want it to be indexed.

You want the compositeId router.

Even though the name "implicit" makes sense in the context of Solr
*code*, it is a confusing name when it comes to user expectations.
You're not the first one to be confused by this, which is why I opened
this issue:

https://issues.apache.org/jira/browse/SOLR-6630

Thanks,
Shawn