You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "bszabo97 (via GitHub)" <gi...@apache.org> on 2023/01/27 12:16:08 UTC

[GitHub] [solr] bszabo97 opened a new pull request, #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

bszabo97 opened a new pull request, #1315:
URL: https://github.com/apache/solr/pull/1315

   https://issues.apache.org/jira/browse/SOLR-16637
   
   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Solr:
   
   * https://issues.apache.org/jira/projects/SOLR
   
   For something minor (i.e. that wouldn't be worth putting in release notes), you can skip JIRA. 
   To create a Jira issue, you will need to create an account there first.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * SOLR-####: <short description of problem or changes>
   
   SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   [SOLR-3749](https://issues.apache.org/jira/browse/SOLR-3749) introduced an option for solrconfig.xml to be able configure the syncLevel but this does not appear in the ref guide.
   
   # Solution
   
   Added `syncLevel` configuration option to the ref guide.
   
   # Tests
   
   
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [x] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] epugh commented on pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "epugh (via GitHub)" <gi...@apache.org>.
epugh commented on PR #1315:
URL: https://github.com/apache/solr/pull/1315#issuecomment-1408848126

   Do you know who wrote the original code?  Maybe use git blame and figure it out, and we tag them on this PR to ask?   I definilty don't know!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bszabo97 commented on pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "bszabo97 (via GitHub)" <gi...@apache.org>.
bszabo97 commented on PR #1315:
URL: https://github.com/apache/solr/pull/1315#issuecomment-1409050145

   Looking again at the implementation in the HdfsTransactionLog class and diving a bit deeper into it helped me in getting the picture together a bit more. This is how I understand now:
   
   FLUSH: We explicitly flush the contents of the buffer but we rely on the output stream that it is able to write the content out to the tlog file.
   FSYNC: We flush the contents of the buffer the same way we did with FLUSH but we also wait for a confirmation that the content was written out to the tlog file.
   
   At the moment I am not sure how the NONE works but I will do some testing and figure out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] epugh merged pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "epugh (via GitHub)" <gi...@apache.org>.
epugh merged PR #1315:
URL: https://github.com/apache/solr/pull/1315


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bszabo97 commented on pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "bszabo97 (via GitHub)" <gi...@apache.org>.
bszabo97 commented on PR #1315:
URL: https://github.com/apache/solr/pull/1315#issuecomment-1408912626

   This part was committed by Mark Miller in a rather huge patch here https://github.com/apache/solr/commit/9fed484fb108e53cd6026133f660b3fd270e2025 
   But as I can see this was an effort made by multiple committers in multiple sub-tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a diff in pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "janhoy (via GitHub)" <gi...@apache.org>.
janhoy commented on code in PR #1315:
URL: https://github.com/apache/solr/pull/1315#discussion_r1093195793


##########
solr/solr-ref-guide/modules/configuration-guide/pages/commits-transaction-logs.adoc:
##########
@@ -279,6 +279,21 @@ The number of buckets used to keep track of maximum version values when checking
 Increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing.
 This requires `(8 bytes (long) * numVersionBuckets)` of heap space per Solr core.
 
+`syncLevel`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `FLUSH`
+|===
++
+The sync level of the transaction log files. Can be NONE, FLUSH or FSYNC, if nothing is set FLUSH is the default.
+
+These configuration options work in the following way:
+
+* FSYNC: Solr internal FastOutputStream buffer is explicitly flushed to the underlying OutputStream and also this OutputStream buffer is flushed to the tlog file. This is a more expensive operation but safer since the content is written to the tlog file.

Review Comment:
   Thanks for the details here. Initially I thought perhaps that the ref-guide should use plain english and find a more high-level way to talk about this than using class names `FastOutputStream` etc.
   
   Also, I wonder if we instead of "the tlog file" should spell out "the transaction log file" here too, since this is user facing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bszabo97 commented on a diff in pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "bszabo97 (via GitHub)" <gi...@apache.org>.
bszabo97 commented on code in PR #1315:
URL: https://github.com/apache/solr/pull/1315#discussion_r1093240238


##########
solr/solr-ref-guide/modules/configuration-guide/pages/commits-transaction-logs.adoc:
##########
@@ -279,6 +279,21 @@ The number of buckets used to keep track of maximum version values when checking
 Increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing.
 This requires `(8 bytes (long) * numVersionBuckets)` of heap space per Solr core.
 
+`syncLevel`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `FLUSH`
+|===
++
+The sync level of the transaction log files. Can be NONE, FLUSH or FSYNC, if nothing is set FLUSH is the default.
+
+These configuration options work in the following way:
+
+* FSYNC: Solr internal FastOutputStream buffer is explicitly flushed to the underlying OutputStream and also this OutputStream buffer is flushed to the tlog file. This is a more expensive operation but safer since the content is written to the tlog file.

Review Comment:
   Totally agree. I have uploaded a new commit where I tried to explain it in a more high-level way. Also changed the tlog to transaction log.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bszabo97 commented on pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "bszabo97 (via GitHub)" <gi...@apache.org>.
bszabo97 commented on PR #1315:
URL: https://github.com/apache/solr/pull/1315#issuecomment-1406690390

   Thank you for looking at it. Allow me some time to dig deeper into how these different sync levels work and try to summarise it here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bszabo97 commented on pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "bszabo97 (via GitHub)" <gi...@apache.org>.
bszabo97 commented on PR #1315:
URL: https://github.com/apache/solr/pull/1315#issuecomment-1408840182

   I have done some research and right now I have an idea of what these configuration options differ from each other:
   
   Flush: we flush the buffer of the FastOutputStream, which means we write the data to the underlaying output stream. Although we do not make sure that this data actually will reach the tlog file, this will be the responsibility of the underlaying output stream.
   Fsync: we forcefully make sure that the data we flushed to the output stream is written to the tlog file.
   None: We are not flushing the contents of the output streams at all, in case of a failure we might end up with losing data.
   
   I am not sure if I understand the behaviour correctly and also not 100% sure how the above described difference between fsync and flush would appear in a real world scenario. If you could provide some guidance I would be grateful :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] epugh commented on pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "epugh (via GitHub)" <gi...@apache.org>.
epugh commented on PR #1315:
URL: https://github.com/apache/solr/pull/1315#issuecomment-1406680196

   This totally meets the goal of your PR...    Is attempting to expliain why you would pick different sync levels out of scope/too hard to do?   That would be the cherry on this..   On the other hand, maybe it's one of those things that if you care about that, then you are sleuthing through source code.    So, happy to commit as is, but if you wanted to add more, happy to wait ;-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bszabo97 commented on pull request #1315: SOLR-16637: Add syncLevel configuration option to the reference guide

Posted by "bszabo97 (via GitHub)" <gi...@apache.org>.
bszabo97 commented on PR #1315:
URL: https://github.com/apache/solr/pull/1315#issuecomment-1410097705

   After some testing and more research I have added a description to the configuration options. Please review it if you have the time. I feel like it is a bit more on the technical side but I was not able to highlight the differences without going deeper into the inner workings of the syncLevel.
   
   Thank you for the question and the patience in regards of the answer, it was fascinating to dive deeper into this part of the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org