You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "psalagnac (via GitHub)" <gi...@apache.org> on 2023/05/16 19:34:17 UTC

[GitHub] [solr] psalagnac opened a new pull request, #1646: SOLR-14630: When querying a single collection, send the request to a replica

psalagnac opened a new pull request, #1646:
URL: https://github.com/apache/solr/pull/1646

   [SOLR-14630](https://issues.apache.org/jira/browse/SOLR-14630)
   
   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Solr:
   
   * https://issues.apache.org/jira/projects/SOLR
   
   For something minor (i.e. that wouldn't be worth putting in release notes), you can skip JIRA. 
   To create a Jira issue, you will need to create an account there first.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * SOLR-####: <short description of problem or changes>
   
   SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   In SolrJ, the `_route_` parameter is taken into account to find a node with a replica for the collection, but the query is send to the collection (without specifying any core). The node will then pick-up a random local core for the collection, that is not necessarily scoped by the query.
   Search works because the request is forwarded to the correct shards, but we may avoid one redirection.
   
   
   # Solution
   
   When querying a single collection and the `_route_` parameter is specified, send the query to a replica of this collection.
   No change if querying more than one collection.
   
   # Tests
   
   Added `CloudSolrClientRoutingTest`, adapted from Jason Baik's patch in the Jira ticket.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] psalagnac commented on a diff in pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "psalagnac (via GitHub)" <gi...@apache.org>.
psalagnac commented on code in PR #1646:
URL: https://github.com/apache/solr/pull/1646#discussion_r1210596079


##########
solr/CHANGES.txt:
##########
@@ -159,6 +159,11 @@ Optimizations
 * SOLR-16764: Clarify that ExportTool exports documents in JSON with Lines format, not standard JSON.  Add explicit -compress option for gzipping output.
   Add ability to specific a directory for output along with a specific file when using -out. (Eric Pugh)
 
+* SOLR-14630: When querying a single collection with `_route_` parameter or `shards.preference`
+  specified, SolrJ targets directly a replica of the collection (and not the
+  collection it self) to save a needless HTTP hop. (Pierre Salagnac)

Review Comment:
   Updated. Thanks for the suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "dsmiley (via GitHub)" <gi...@apache.org>.
dsmiley commented on code in PR #1646:
URL: https://github.com/apache/solr/pull/1646#discussion_r1200575001


##########
solr/solrj/src/test/org/apache/solr/client/solrj/impl/CloudSolrClientRoutingTest.java:
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.client.solrj.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.solr.client.solrj.SolrClient;
+import org.apache.solr.client.solrj.SolrQuery;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.request.CollectionAdminRequest;
+import org.apache.solr.cloud.SolrCloudTestCase;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.CommonParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.NamedList;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public final class CloudSolrClientRoutingTest extends SolrCloudTestCase {

Review Comment:
   Thank you for writing a test.
   
   Debatably, the little change you did in TestDistributedTracing was sufficient -- at least for my tastes.
   
   I personally think distributed tracing via MockTracer (as you found in TestDistributedTracing) should be how tests might test internal routing decisions, as it's a generic/uniform approach.



##########
solr/CHANGES.txt:
##########
@@ -171,6 +171,8 @@ Bug Fixes
 
 * SOLR-16760: bin/solr package tool supports -h and -help command like other tools.  (Eric Pugh)
 
+* SOLR-14630: When querying a single collection, SolrJ targets directly a replica of the collection (and not the collection it self) to avoid useless forwarding. (Pierre Salagnac)

Review Comment:
   This is an optimization but a bug, I'd say.  This is how the user will experience it.  I would also reference "_route_" and/or "shards.preference" being pertinent.
   Also "it self" -> "itself".
   
   Suggestion:
   SolrJ CloudSolrClient: when searching one collection with "_route_" or "shards.preference", form the HTTP request directly to the intended core instead of the collection.  Saves a needless HTTP hop sometimes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] psalagnac commented on a diff in pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "psalagnac (via GitHub)" <gi...@apache.org>.
psalagnac commented on code in PR #1646:
URL: https://github.com/apache/solr/pull/1646#discussion_r1203757648


##########
solr/CHANGES.txt:
##########
@@ -171,6 +171,8 @@ Bug Fixes
 
 * SOLR-16760: bin/solr package tool supports -h and -help command like other tools.  (Eric Pugh)
 
+* SOLR-14630: When querying a single collection, SolrJ targets directly a replica of the collection (and not the collection it self) to avoid useless forwarding. (Pierre Salagnac)

Review Comment:
   That's a good point.
   Moving this to the optimization list and mentioning "shards.preference"



##########
solr/solrj/src/test/org/apache/solr/client/solrj/impl/CloudSolrClientRoutingTest.java:
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.client.solrj.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.solr.client.solrj.SolrClient;
+import org.apache.solr.client.solrj.SolrQuery;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.request.CollectionAdminRequest;
+import org.apache.solr.cloud.SolrCloudTestCase;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.CommonParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.NamedList;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public final class CloudSolrClientRoutingTest extends SolrCloudTestCase {

Review Comment:
   I think there is value in keeping this test.
   
   While it is covered by distributed tracing test right now, it's more a side effect than what we actually want to test. Assuming the strategy for distributed tracing is dramatically revisited in the future, we may loose non regression coverage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "dsmiley (via GitHub)" <gi...@apache.org>.
dsmiley commented on code in PR #1646:
URL: https://github.com/apache/solr/pull/1646#discussion_r1204201305


##########
solr/CHANGES.txt:
##########
@@ -159,6 +159,11 @@ Optimizations
 * SOLR-16764: Clarify that ExportTool exports documents in JSON with Lines format, not standard JSON.  Add explicit -compress option for gzipping output.
   Add ability to specific a directory for output along with a specific file when using -out. (Eric Pugh)
 
+* SOLR-14630: When querying a single collection with `_route_` parameter or `shards.preference`
+  specified, SolrJ targets directly a replica of the collection (and not the
+  collection it self) to save a needless HTTP hop. (Pierre Salagnac)

Review Comment:
   It occurred to me that my wording suggestion is sort of wrong.  We didn't add a check for those parameters to exist before doing this thing; we merely look at the number of collections.  My wording choice was based on when this optimization matters to a user.  Let me try again.  (BTW "itself" is one word not two).
   
   ```suggestion
   * SOLR-14630: When querying with the `_route_` or `shards.preference` parameter and if there are multiple replicas for a collection on the receiving node, there was sometimes an extra HTTP hop or the non-intended replica may receive the request.  SolrJ will now form the URL directly to the intended replica of the collection (and not the collection itself). (Pierre Salagnac, Ivan Djurasevic)
   ```
   (newlines needed) Also included the reporter's name, which is a courtesy for the sometimes non-trivial work filing the JIRA issue and populating it with useful information for us to come along and fix :-). I dropped reference to multiple collections as it's more of an obscure detail that doesn't apply to 95% of people.  We try to keep CHANGES.txt brief.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley merged pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "dsmiley (via GitHub)" <gi...@apache.org>.
dsmiley merged PR #1646:
URL: https://github.com/apache/solr/pull/1646


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] sonatype-lift[bot] commented on a diff in pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "sonatype-lift[bot] (via GitHub)" <gi...@apache.org>.
sonatype-lift[bot] commented on code in PR #1646:
URL: https://github.com/apache/solr/pull/1646#discussion_r1195655517


##########
solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java:
##########
@@ -1117,8 +1117,14 @@ protected NamedList<Object> sendRequest(SolrRequest<?> request, List<String> inp
       sortedReplicas.forEach(
           replica -> {
             if (seenNodes.add(replica.getNodeName())) {
-              theUrlList.add(
-                  ZkCoreNodeProps.getCoreUrl(replica.getBaseUrl(), joinedInputCollections));
+              if (inputCollections.size() == 1 && collectionNames.size() == 1) {
+                // If we have a single collection name (and not a alias to multiple collection),
+                // send the query directly to a replica of this collection.
+                theUrlList.add(replica.getCoreUrl());
+              } else {
+                theUrlList.add(
+                    ZkCoreNodeProps.getCoreUrl(replica.getBaseUrl(), joinedInputCollections));

Review Comment:
   <picture><img alt="16% of developers fix this issue" src="https://lift.sonatype.com/api/commentimage/fixrate/16/display.svg"></picture>
   
   <b>*NULL_DEREFERENCE:</b>*  object returned by `replica.getBaseUrl()` could be null and is dereferenced by call to `getCoreUrl(...)` at line 1126.
   
   ---
   
   <details><summary>ℹ️ Expand to see all <b>@sonatype-lift</b> commands</summary>
   
   You can reply with the following commands. For example, reply with ***@sonatype-lift ignoreall*** to leave out all findings.
   | **Command** | **Usage** |
   | ------------- | ------------- |
   | `@sonatype-lift ignore` | Leave out the above finding from this PR |
   | `@sonatype-lift ignoreall` | Leave out all the existing findings from this PR |
   | `@sonatype-lift exclude <file\|issue\|path\|tool>` | Exclude specified `file\|issue\|path\|tool` from Lift findings by updating your config.toml file |
   
   **Note:** When talking to LiftBot, you need to **refresh** the page to see its response.
   <sub>[Click here](https://github.com/apps/sonatype-lift/installations/new) to add LiftBot to another repo.</sub></details>
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] sonatype-lift[bot] commented on a diff in pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "sonatype-lift[bot] (via GitHub)" <gi...@apache.org>.
sonatype-lift[bot] commented on code in PR #1646:
URL: https://github.com/apache/solr/pull/1646#discussion_r1200545545


##########
solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java:
##########
@@ -1117,8 +1117,14 @@ protected NamedList<Object> sendRequest(SolrRequest<?> request, List<String> inp
       sortedReplicas.forEach(
           replica -> {
             if (seenNodes.add(replica.getNodeName())) {
-              theUrlList.add(
-                  ZkCoreNodeProps.getCoreUrl(replica.getBaseUrl(), joinedInputCollections));
+              if (inputCollections.size() == 1 && collectionNames.size() == 1) {
+                // If we have a single collection name (and not a alias to multiple collection),
+                // send the query directly to a replica of this collection.
+                theUrlList.add(replica.getCoreUrl());
+              } else {
+                theUrlList.add(
+                    ZkCoreNodeProps.getCoreUrl(replica.getBaseUrl(), joinedInputCollections));

Review Comment:
   I've recorded this as ignored for this pull request.
   If you change your mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] psalagnac commented on a diff in pull request #1646: SOLR-14630: When querying a single collection, send the request to a replica

Posted by "psalagnac (via GitHub)" <gi...@apache.org>.
psalagnac commented on code in PR #1646:
URL: https://github.com/apache/solr/pull/1646#discussion_r1200545455


##########
solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java:
##########
@@ -1117,8 +1117,14 @@ protected NamedList<Object> sendRequest(SolrRequest<?> request, List<String> inp
       sortedReplicas.forEach(
           replica -> {
             if (seenNodes.add(replica.getNodeName())) {
-              theUrlList.add(
-                  ZkCoreNodeProps.getCoreUrl(replica.getBaseUrl(), joinedInputCollections));
+              if (inputCollections.size() == 1 && collectionNames.size() == 1) {
+                // If we have a single collection name (and not a alias to multiple collection),
+                // send the query directly to a replica of this collection.
+                theUrlList.add(replica.getCoreUrl());
+              } else {
+                theUrlList.add(
+                    ZkCoreNodeProps.getCoreUrl(replica.getBaseUrl(), joinedInputCollections));

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org