You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/24 07:12:18 UTC

[GitHub] [lucene] gautamworah96 opened a new pull request, #922: Index only the docs for FacetField posting list

gautamworah96 opened a new pull request, #922:
URL: https://github.com/apache/lucene/pull/922

   ### Description (or a Jira issue link if you have one)
   
   Change the index option for FacetField to just index the DOCS and not the frequencies and offsets (we don't use these values). I still need to think a bit more about the long term implications of this change. Opening this PR as a starting point for now.
   
   ### Tests
   
   Existing tests pass. I looked for all instances where the code was traversing through the postings list in the facet module and none of them were relying on the frequency of the term (this is something that confused me at first, do we have no use cases for `FacetField` that rely on the frequency? apparently not.. If a use case does come up we could change the field type again).
   
   
   ### Benchmarks
   
   I ran the `wikimediumall` benchmark with a modified localrun.py script that used a new index for the candidate and it did not show any sizeable QPS changes. Size of the `facet` index also remained the same at 64 MB.
   
   
   ### Backwards compatibility
   
   I've not thought about this in depth (yet). Existing back-compat tests in branch_9x pass.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r899250685


##########
lucene/CHANGES.txt:
##########
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida)
 
+* Remove unused and confusing FacetField indexing options (Gautam Worah)

Review Comment:
   Can you change this to:
   
   ```suggestion
   * GITHUB#992: Remove unused and confusing FacetField indexing options (Gautam Worah)
   ```
   
   You probably saw that we now allow changes without corresponding Jira issues, but we use the PR reference in place of the issue ID in this case. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a diff in pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r904112058


##########
lucene/CHANGES.txt:
##########
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida)
 
+* GITHUB#922: Remove unused and confusing FacetField indexing options (Gautam Worah)

Review Comment:
   Yes, that makes sense. Apologies for the delay in responding. I've moved it to 9.3 for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a diff in pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r899384834


##########
lucene/CHANGES.txt:
##########
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida)
 
+* Remove unused and confusing FacetField indexing options (Gautam Worah)

Review Comment:
   Ugh. Sorry about this. I did not see any GITHUB issues in the vicinity and assumed that this should work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #922:
URL: https://github.com/apache/lucene/pull/922#issuecomment-1156516445

   Yeah I think these attributes of `FacetField.TYPE` are not used, but it is trappy looking.  This `TYPE` is really just a marker/singleton so `FacetsConfig.build` can know which fields in the incoming document are `FacetField`.
   
   So this is really a "remove dead code that looks like it might still be twitching" sort of situation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on PR #922:
URL: https://github.com/apache/lucene/pull/922#issuecomment-1156032542

   > I think we might be doing the right thing already? If you look at StringField, we are setting: setIndexOptions(IndexOptions.DOCS)
   
   Yes, that is indeed the case.
   
   Thanks for taking a look at this change @gsmiller. I had committed my changes so as to not lose them with time. I'll explicitly request a review through the UI the next time I commit my partial work :)   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gautamworah96 commented on a diff in pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gautamworah96 commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r897570270


##########
lucene/facet/src/java/org/apache/lucene/facet/FacetField.java:
##########
@@ -30,14 +30,12 @@
  */
 public class FacetField extends Field {
 
-  /** Field type used for storing facet values: docs, freqs, and positions. */
+  /**
+   * Field type used for storing facet values. Actual field type used for indexing is determined in
+   * {@link FacetsConfig#build(TaxonomyWriter, Document)}
+   */
   public static final FieldType TYPE = new FieldType();
 
-  static {

Review Comment:
   Yeah, tbh, I was debating whether this change was even needed or no. Lets keep it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r893860958


##########
lucene/facet/src/java/org/apache/lucene/facet/FacetField.java:
##########
@@ -30,14 +30,12 @@
  */
 public class FacetField extends Field {
 
-  /** Field type used for storing facet values: docs, freqs, and positions. */
+  /**
+   * Field type used for storing facet values. Actual field type used for indexing is determined in
+   * {@link FacetsConfig#build(TaxonomyWriter, Document)}
+   */
   public static final FieldType TYPE = new FieldType();
 
-  static {

Review Comment:
   I think it's reasonable to remove this to avoid potential future confusion (like what arose in this PR initially). I did some code searching and can't find anywhere that `FacetField` is used outside of `FacetsConfig`, and the change looks like a perfectly safe no-op. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #922:
URL: https://github.com/apache/lucene/pull/922#issuecomment-1163504036

   @gautamworah96 merged; thanks again! Would you mind opening a separate PR to backport?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #922:
URL: https://github.com/apache/lucene/pull/922#issuecomment-1151492288

   @gautamworah96 can you add a CHANGES entry please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r900382196


##########
lucene/CHANGES.txt:
##########
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida)
 
+* GITHUB#922: Remove unused and confusing FacetField indexing options (Gautam Worah)

Review Comment:
   Sorry @gautamworah96, I should have noticed this earlier, but should we put this under 9.3? This should be safe to backport.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller merged pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gsmiller merged PR #922:
URL: https://github.com/apache/lucene/pull/922


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
mikemccand commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r898026016


##########
lucene/CHANGES.txt:
##########
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida)
 
+* Remove redundant FacetField indexing options (Gautam Worah)

Review Comment:
   Maybe change `redundant` to `unused and confusing`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #922: Index only the docs for FacetField posting list

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #922:
URL: https://github.com/apache/lucene/pull/922#issuecomment-1136295588

   I'm not actually sure these options are referenced/honored anywhere during indexing, which might explain why you don't see a difference. Maybe you've dug into this deeper and know better, but I think all the term indexing for drill-down happens in `FacetsConfig#indexDrillDownTerms`, where it creates new `StringField` instances for the doc for the actual indexing. Like I said though, maybe you've dug into this deeper?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org