You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Marc Brette (Jira)" <ji...@apache.org> on 2022/03/21 10:51:00 UTC

[jira] [Created] (SOLR-16108) Incorrect distribution of records in shards after a split with splitByKeyprefix,when using the CompositeId router with a router field defined

Marc Brette created SOLR-16108:
----------------------------------

             Summary: Incorrect distribution of records in shards after a split with splitByKeyprefix,when using the CompositeId router with a router field defined
                 Key: SOLR-16108
                 URL: https://issues.apache.org/jira/browse/SOLR-16108
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 8.4
            Reporter: Marc Brette


When a collection is created using the CompositeId router with a router field defined, and one of its shard contains records with the same routing key, and a split of its shard is performed with splitByKeyprefix parameter, we expect the records to be uniformly distributed between the two resulting shards.

Instead, one shard contains no record, the other contains all the records.

Steps to reproduce:
{code:java}
docker network create solr-network
# run in one terminal
docker run -it -h solr1 --name solr1 --net solr-network -p 18983:8983 solr:8.4 /opt/solr/bin/solr -c -f
# run in another terminal
docker run -it -h solr2 --name solr2 --net solr-network -p 28983:8983 solr:8.4 /opt/solr/bin/solr -c -f -z solr1:9983
#-----------------------------------------------------------------------------------------------
# Works, documents are split between the 2 shards
# Create collection with default compositeId router, routing key in the id, only one shard
curl --request GET \
  --url 'http://localhost:18983/solr/admin/collections?action=CREATE&name=routing_by_id&numShards=1'
# Create enough documents, they all have the same routing key (france!)
for i in {0..100}
do
  curl --request POST \
  --url http://localhost:18983/solr/routing_by_id/update/json/docs?commit=true \
  --header 'Content-Type: application/json' \
  --data "[{
    \"id\": \"france\!${i}0\",
    \"title_t\": \"hi\"
}]"
done
# Check it is indexed correctly
curl --request GET \
  --url 'http://localhost:18983/solr/routing_by_id/select?q=*%3A*'
# Split the shard
curl --request GET \
  --url 'http://localhost:18983/solr/admin/collections?action=SPLITSHARD&collection=routing_by_id&shard=shard1&splitByPrefix=true'
# Check records in shard1_0 (~half of the documents there)
curl --request GET \
  --url 'http://localhost:18983/solr/routing_by_id/select?q=*%3A*&shards=shard1_0'
# Check records in shard1_1(~half of the documents there)
curl --request GET \
  --url 'http://localhost:18983/solr/routing_by_id/select?q=*%3A*&shards=shard1_1'

#-----------------------------------------------------------------------------------------------
# Fails, does not split documents in both shards
# Create collection with default compositeId router, routing key in the field "route_t", only one shard
curl --request GET \
  --url 'http://localhost:18983/solr/admin/collections?action=CREATE&name=routing_by_field&numShards=1&router.field=route_t'
# Create enough documents, they all have the same routing key (france!)
for i in {0..100}
do
  curl --request POST \
  --url http://localhost:18983/solr/routing_by_field/update/json/docs?commit=true \
  --header 'Content-Type: application/json' \
  --data "[{
    \"id\": \"${i}0\",
    \"title_t\": \"hi\",
    \"route_t\": \"france\"
}]"
done
# Check it is indexed correctly
curl --request GET \
  --url 'http://localhost:18983/solr/routing_by_field/select?q=*%3A*'
# Split the shard
curl --request GET \
  --url 'http://localhost:18983/solr/admin/collections?action=SPLITSHARD&collection=routing_by_field&shard=shard1&splitByPrefix=true'
# Check records in shard1_0: no document!
curl --request GET \
  --url 'http://localhost:18983/solr/routing_by_field/select?q=*%3A*&shards=shard1_0'
# Check records in shard1_1: all documents!
curl --request GET \
  --url 'http://localhost:18983/solr/routing_by_field/select?q=*%3A*&shards=shard1_1'
   {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org