You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2021/04/11 15:28:50 UTC

[lucene] branch main updated: LUCENE-9916: add a simple regeneration help doc (#73)

This is an automated email from the ASF dual-hosted git repository.

rmuir pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/lucene.git


The following commit(s) were added to refs/heads/main by this push:
     new 9d15435  LUCENE-9916: add a simple regeneration help doc (#73)
9d15435 is described below

commit 9d15435b15b94bc21bd4a60f03bdfff2e3f76a54
Author: Robert Muir <rm...@apache.org>
AuthorDate: Sun Apr 11 11:28:41 2021 -0400

    LUCENE-9916: add a simple regeneration help doc (#73)
    
    Add a simple regeneration help doc
    
    Improve task help and checksum failure message (include corresponding regeneration task). Sorry for being verbose. Maybe somebody will read it. :)
    
    Co-authored-by: Dawid Weiss <da...@carrotsearch.com>
---
 gradle/generation/regenerate.gradle |   6 +-
 gradle/help.gradle                  |   1 +
 help/regeneration.txt               | 148 ++++++++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/gradle/generation/regenerate.gradle b/gradle/generation/regenerate.gradle
index df97a41..14ebc65 100644
--- a/gradle/generation/regenerate.gradle
+++ b/gradle/generation/regenerate.gradle
@@ -119,7 +119,7 @@ configure([
               expected = expected - same
 
               throw new GradleException("Checksums mismatch for derived resources; you might have" +
-                  " modified a generated source file?:\n" +
+                  " modified a generated resource (regenerate task: ${sourceTask.path}IfChanged):\n" +
                   "Actual:\n  ${actual.entrySet().join('\n  ')}\n\n" +
                   "Expected:\n  ${expected.entrySet().join('\n  ')}"
               )
@@ -175,6 +175,10 @@ configure([
         project.afterEvaluate {
           conditionalTask.group sourceTask.group
           conditionalTask.description sourceTask.description + " (if sources changed)"
+
+          // Hide low-level tasks from help.
+          sourceTask.group = null
+          sourceTask.description sourceTask.description + " (low-level)"
         }
 
         // Set conditional execution only if checksum mismatch occurred.
diff --git a/gradle/help.gradle b/gradle/help.gradle
index eab6a42..fdc7b03 100644
--- a/gradle/help.gradle
+++ b/gradle/help.gradle
@@ -26,6 +26,7 @@ configure(rootProject) {
       ["Deps", "help/dependencies.txt", "Declaring, inspecting and excluding dependencies."],
       ["ForbiddenApis", "help/forbiddenApis.txt", "How to add/apply rules for forbidden APIs."],
       ["LocalSettings", "help/localSettings.txt", "Local settings, overrides and build performance tweaks."],
+      ["Regeneration", "help/regeneration.txt", "How to refresh generated and derived resources."],
       ["Git", "help/git.txt", "Git assistance and guides."],
       ["IDEs", "help/IDEs.txt", "IDE support."]
   ]
diff --git a/help/regeneration.txt b/help/regeneration.txt
new file mode 100644
index 0000000..a9cd170
--- /dev/null
+++ b/help/regeneration.txt
@@ -0,0 +1,148 @@
+Regeneration
+============
+
+Lucene has a number of machine-generated resources - some of these are
+resource (binary) files, others are Java source files that are stored
+(and compiled) with the rest of Lucene source code.
+
+If you're reading this, chances are that:
+
+1) you've hit a precommit check error that said you've modified a generated
+   resource and some checksums are out of sync.
+
+2) you need to regenerate one (or more) of these resources.
+
+In many cases hitting (1) means you'll have to do (2) so let's discuss
+these in order.
+
+
+Checksum validation errors
+--------------------------
+
+LUCENE-9868 introduced a system of storing (and validating) checksums of
+generated files so that they are not accidentally modified. This checkums
+system will fail the build with a message similar to this one:
+
+Execution failed for task ':lucene:core:generateStandardTokenizerChecksumCheck'.
+> Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: :lucene:core:generateStandardTokenizerIfChanged):
+  Actual:
+    lucene/core/[...]/StandardTokenizerImpl.java=3298326986432483248962398462938649869326
+
+  Expected:
+    lucene/core/[...]/StandardTokenizerImpl.java=8e33c2698446c1c7a9479796a41316d1932ceda8
+
+The message shows you which resources have mismatches on checksums (in this case 
+StandardTokenizerImpl.java) but also the *module* where the generated
+resource exists and the *task name* that should be used to regenerate this resource:
+
+:lucene:core:generateStandardTokenizerIfChanged
+
+To resolve the problem, try to:
+
+1) "git diff" the changes that caused the build failure (to see why the checksums
+changed) and then decide whether to update the generated resource's template (or whatever
+it is using to emit the generated resource);
+
+2) regenerate the derived resources, possibly saving new checksums. If you decide to 
+regenerate, just run the task hinted at in the error message, for example:
+
+gradlew :lucene:core:generateStandardTokenizerIfChanged
+
+This regenerates all resources the task "generateStandardTokenizer" produces 
+and updates the corresponding checksums.
+
+
+Resource regeneration
+---------------------
+
+The "convention" task for regenerating all derived resources in a given
+module is called "regenerate" and you can apply it to all Lucene modules
+by running:
+
+gradlew regenerate
+
+It is typically much wiser to limit the scope of regeneration to only 
+the module you're working with though:
+
+gradlew -p lucene/analysis/common regenerate
+
+If you're interested in what specific generation tasks are available, see
+the task list for the generation group:
+
+gradlew tasks --group generation
+
+or limit the output to a particular module:
+
+gradlew -p lucene/analysis/common tasks --group generation
+
+which displays (at the moment of writing):
+
+generateClassicTokenizerIfChanged - Regenerate ClassicTokenizerImpl.java (if sources changed)
+generateHTMLStripCharFilterIfChanged - Regenerate HTMLStripCharFilter.java (if sources changed)
+generateTldsIfChanged - Regenerate top-level domain jflex macros and tests (if sources changed)
+generateUAX29URLEmailTokenizerIfChanged - Regenerate UAX29URLEmailTokenizerImpl.java (if sources changed)
+generateWikipediaTokenizerIfChanged - Regenerate WikipediaTokenizerImpl.java (if sources changed)
+regenerate - Rerun any code or static data generation tasks.
+snowball - Regenerates snowball stemmers.
+
+You may wonder what all those *IfChanged tasks are...
+
+
+Resource checksums, incremental generation and advanced topics
+--------------------------------------------------------------
+
+Many resource generation tasks require specific tools (perl, python, bash shell)
+and resources that may not be available on all platforms. In LUCENE-9868 we tried 
+to make resource generation tasks "incremental" so that they only run if their 
+sources (or outputs) have changed. So if you run the generic "regenerate" task, many of the
+actual regeneration sub-tasks will be "skipped" - you can see this if you run gradle with
+plain console, for example:
+
+gradlew -p lucene/analysis/common regenerate --console=plain
+
+...
+> Task :lucene:analysis:common:generateUnicodePropsIfChanged
+Checksums consistent with sources, skipping task: :lucene:analysis:common:generateUnicodeProps
+...
+
+This shouldn't worry you at all. The "*IfChanged" tasks wrap the actual generation
+tasks and verify whether the inputs and outputs of a task have changed. If so, the task
+is run (and follow-up task such as tidy are scheduled). If the checksums are identical to
+what was previously saved, the regeneration task is skipped.
+
+Of course, sometimes you may want to *force* the regeneration task to run, even if the
+checksums indicate nothing has changed. This may happen because of several reasons:
+
+- the generation task has outputs but no inputs or the inputs are volatile. In this case
+only the outputs have checksums and the task will be skipped if the outputs haven't changed.
+
+- you may want to run the regeneration task just to see that it actually runs and produces
+the same checksums (git diff should be clean). This would be a wise periodic sanity check
+to ensure everything works as expected.
+
+If you want to force-run the regeneration, use gradle's "--rerun-tasks" option:
+
+gradlew regenerate --rerun-tasks
+
+Scoping the call to a particular module will also work:
+
+gradlew -p lucene/analysis/common regenerate --rerun-tasks
+
+Scoping the call to a particular task will also work:
+
+gradlew -p lucene/analysis/common generateUnicodePropsIfChanged --rerun-tasks
+
+You *should not* call the underlying generation task directly; this is possible
+but discouraged:
+
+gradlew -p lucene/analysis/common generateUnicodeProps --rerun-tasks
+
+The reason is that some of these generation tasks require follow-up (for example
+source code tidying) and, more importantly, the checksums for these 
+regenerated resources won't be saved (so the next time you run 'check' it'll fail
+with checksum mismatches).
+
+Finally, if you do feel like force-regenerating everything, remember to exclude this 
+monster...
+
+gradlew regenerate -x generateUAX29URLEmailTokenizer --rerun-tasks