You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "kinow (via GitHub)" <gi...@apache.org> on 2023/02/12 17:09:57 UTC

[GitHub] [jena-site] kinow commented on a diff in pull request #146: Add basic search with Fuse.js (search engine), Mark.js (word highlighter) and Hugo (search index)

kinow commented on code in PR #146:
URL: https://github.com/apache/jena-site/pull/146#discussion_r1103840564


##########
layouts/_default/search.html:
##########
@@ -0,0 +1,200 @@
+{{ define "main" }}
+<!-- Source: https://makewithhugo.com/add-search-to-a-hugo-site/ -->
+<main>
+  <div id="search-results"></div>
+  <div class="search-loading">Loading...</div>
+
+  <script id="search-result-template" type="text/x-js-template">
+    <div id="summary-${key}">
+      <h3><a href="${link}">${title}</a></h3>
+      <p class="pb-0 mb-0">${snippet}</p>
+      <p class="opacity-50 pt-0 mt-0"><small>Score: ${score}</small></p>
+      <p>
+        <small>
+          ${ isset tags }Tags: ${tags}<br>${ end }
+        </small>
+      </p>
+    </div>
+  </script>
+
+  <script src="/js/fuse.min.js" type="text/javascript" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
+  <script src="/js/mark.min.js" type="text/javascript" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
+  <script type="text/javascript">
+    (function() {
+      const summaryInclude = 180;
+      // See: https://fusejs.io/api/options.html
+      const fuseOptions = {
+        // Indicates whether comparisons should be case sensitive.
+        isCaseSensitive: false,
+        // Whether the score should be included in the result set.
+        // A score of 0 indicates a perfect match, while a score of 1 indicates a complete mismatch.
+        includeScore: true,
+        // Whether the matches should be included in the result set.
+        // When true, each record in the result set will include the indices of the matched characters.
+        // These can consequently be used for highlighting purposes.
+        includeMatches: true,
+        // Only the matches whose length exceeds this value will be returned.
+        // (For instance, if you want to ignore single character matches in the result, set it to 2).
+        minMatchCharLength: 2,
+        // Whether to sort the result list, by score.
+        shouldSort: true,
+        // List of keys that will be searched.
+        // This supports nested paths, weighted search, searching in arrays of strings and objects.
+        keys: [
+          {name: "title", weight: 0.8},
+          {name: "contents", weight: 0.7},
+          // {name: "tags", weight: 0.95},
+          // {name: "categories", weight: 0.05}
+        ],
+        // --- Fuzzy Matching Options
+        // Determines approximately where in the text is the pattern expected to be found.
+        location: 0,
+        // At what point does the match algorithm give up.
+        // A threshold of 0.0 requires a perfect match (of both letters and location),
+        // a threshold of 1.0 would match anything.
+        threshold: 0.2,

Review Comment:
   With a `threshold` of `0` I get `7` search results for SHACL. That's the same number I get when grepping for it (case insensitive),
   
   ```bash
   kinow@ranma:~/Development/java/jena/jena-site/source$ grep -r -H -o -i SHACL | awk -F: '{ print $1 }' | sort -h | uniq
   documentation/fuseki2/fuseki-config-endpoint.md
   documentation/__index.md
   documentation/javadoc.md
   documentation/notes/system-initialization.md
   documentation/shacl/__index.md
   documentation/tools/__index.md
   download/maven.md
   ```
   
   But if I search for "shakl" it brings `0` results.
   
   With `0.2`, both SHACL and SHAKL bring me 14 search results. The 7 first results have a score lower than `1` (in Fuse.js higher is worse), and the other 7 have a score of `1` (I left the score to be displayed with results to help users).
   
   So I decided to leave it to 0.2 so users still get some result if they misspell their search query.



##########
layouts/_default/search.html:
##########
@@ -0,0 +1,200 @@
+{{ define "main" }}
+<!-- Source: https://makewithhugo.com/add-search-to-a-hugo-site/ -->
+<main>
+  <div id="search-results"></div>
+  <div class="search-loading">Loading...</div>
+
+  <script id="search-result-template" type="text/x-js-template">
+    <div id="summary-${key}">
+      <h3><a href="${link}">${title}</a></h3>
+      <p class="pb-0 mb-0">${snippet}</p>
+      <p class="opacity-50 pt-0 mt-0"><small>Score: ${score}</small></p>
+      <p>
+        <small>
+          ${ isset tags }Tags: ${tags}<br>${ end }
+        </small>
+      </p>
+    </div>
+  </script>
+
+  <script src="/js/fuse.min.js" type="text/javascript" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
+  <script src="/js/mark.min.js" type="text/javascript" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
+  <script type="text/javascript">
+    (function() {
+      const summaryInclude = 180;
+      // See: https://fusejs.io/api/options.html
+      const fuseOptions = {
+        // Indicates whether comparisons should be case sensitive.
+        isCaseSensitive: false,
+        // Whether the score should be included in the result set.
+        // A score of 0 indicates a perfect match, while a score of 1 indicates a complete mismatch.
+        includeScore: true,
+        // Whether the matches should be included in the result set.
+        // When true, each record in the result set will include the indices of the matched characters.
+        // These can consequently be used for highlighting purposes.
+        includeMatches: true,
+        // Only the matches whose length exceeds this value will be returned.
+        // (For instance, if you want to ignore single character matches in the result, set it to 2).
+        minMatchCharLength: 2,
+        // Whether to sort the result list, by score.
+        shouldSort: true,
+        // List of keys that will be searched.
+        // This supports nested paths, weighted search, searching in arrays of strings and objects.
+        keys: [
+          {name: "title", weight: 0.8},
+          {name: "contents", weight: 0.7},
+          // {name: "tags", weight: 0.95},
+          // {name: "categories", weight: 0.05}
+        ],
+        // --- Fuzzy Matching Options
+        // Determines approximately where in the text is the pattern expected to be found.
+        location: 0,
+        // At what point does the match algorithm give up.
+        // A threshold of 0.0 requires a perfect match (of both letters and location),
+        // a threshold of 1.0 would match anything.
+        threshold: 0.2,
+        // Determines how close the match must be to the fuzzy location (specified by location).
+        // An exact letter match which is distance characters away from the fuzzy location would
+        // score as a complete mismatch. A distance of 0 requires the match be at the exact
+        // location specified. A distance of 1000 would require a perfect match to be within 800
+        // characters of the location to be found using a threshold of 0.8.
+        distance: 100,
+        // When true, search will ignore location and distance, so it won't matter where in
+        // the string the pattern appears.
+        //
+        // NOTE: These settings are used to calculate the Fuzziness Score (Bitap algorithm) in Fuse.js.
+        //       It calculates threshold (default 0.6) * distance (default (100), which gives 60 by
+        //       default, meaning it will search for the query-term within 60 characters from the location
+        //       (default 0). Since Jena docs may have very long text that includes the query term anywhere
+        //       we disable it with ignoreLocation: true.
+        //       For more: https://fusejs.io/concepts/scoring-theory.html#scoring-theory
+        ignoreLocation: true,

Review Comment:
   @afs Fuse.js uses the location of the match in its algorithm, which IMO doesn't make much sense for our use case. For example, by default it excludes documents that have the search match appearing 60 after the initial 60 characters.
   
   The setting above disables it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org