You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "stefanvodita (via GitHub)" <gi...@apache.org> on 2023/05/27 12:50:04 UTC

[GitHub] [lucene] stefanvodita opened a new pull request, #12337: Index arbitrary fields in taxonomy docs

stefanvodita opened a new pull request, #12337:
URL: https://github.com/apache/lucene/pull/12337

   This addresses #12336 by letting users pass in an ordinal data appender to the taxonomy writer.
   When the taxonomy writer indexes an ordinal data it will also add the custom fields that the user has requested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Index arbitrary fields in taxonomy docs [lucene]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1880902960

   This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stefanvodita commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "stefanvodita (via GitHub)" <gi...@apache.org>.
stefanvodita commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1298912817


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This is like a {@link DirectoryTaxonomyReader}, except it provides access to the underlying
+ * {@link DirectoryReader} and full path field name.
+ */
+public class DirectoryTaxonomyIndexReader extends DirectoryTaxonomyReader {

Review Comment:
   Great point @epotyom! I’ve thought a bit more about this and I’d like to consider exposing the `IndexReader` of `DirectoryTaxonomyReader` by making `getInternalIndexReader` public instead of protected. I actually like this solution better than what I coded up previously. It’s cleaner, it’s backwards compatible, and a user could have already gotten the `IndexReader` anyway by extending `DirectoryTaxonomyReader`. I’m curious if anyone has other ideas though.



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This is like a {@link DirectoryTaxonomyReader}, except it provides access to the underlying
+ * {@link DirectoryReader} and full path field name.
+ */
+public class DirectoryTaxonomyIndexReader extends DirectoryTaxonomyReader {

Review Comment:
   Great point @epotyom! I’ve thought a bit more about this and I’d like to consider exposing the `IndexReader` of `DirectoryTaxonomyReader` by making `getInternalIndexReader` public instead of protected. I actually like this solution better than what I coded up previously. It’s cleaner, it’s backwards compatible, and a user could have already gotten the `IndexReader` anyway by extending `DirectoryTaxonomyReader`. I’m curious if anyone has other ideas though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Index arbitrary fields in taxonomy docs [lucene]

Posted by "stefanvodita (via GitHub)" <gi...@apache.org>.
stefanvodita commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1930494299

   Thank you for reviving the PR, Mike; it had been sitting around for a good while. I’ll leave it up for a few more days to see if there are other comments and merge if there aren’t.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Index arbitrary fields in taxonomy docs [lucene]

Posted by "stefanvodita (via GitHub)" <gi...@apache.org>.
stefanvodita merged PR #12337:
URL: https://github.com/apache/lucene/pull/12337


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stefanvodita commented on pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "stefanvodita (via GitHub)" <gi...@apache.org>.
stefanvodita commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1686832362

   The commit I pushed makes `DirectoryTaxonomyReader.getInternalIndexReader` public. We also stop relying on the full path field. I’m not sure why I thought we needed it, we can use `getPath`/`getBulkPath` to get labels if we have the corresponding ordinal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shaie commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "shaie (via GitHub)" <gi...@apache.org>.
shaie commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1208332838


##########
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestOrdinalData.java:
##########
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.BiConsumer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.document.NumericDocValuesField;
+import org.apache.lucene.document.StringField;
+import org.apache.lucene.facet.FacetField;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.directory.Consts;
+import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+import org.junit.After;
+import org.junit.Before;
+
+public class TestOrdinalData extends FacetTestCase {
+  Directory taxoDir;
+  IndexReader taxoIndexReader;
+
+  private static final Map<String, Long> labelToScore;

Review Comment:
   nit: you can use `Map.of()` to shorten the code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] epotyom commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "epotyom (via GitHub)" <gi...@apache.org>.
epotyom commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1293702812


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This is like a {@link DirectoryTaxonomyReader}, except it provides access to the underlying
+ * {@link DirectoryReader} and full path field name.
+ */
+public class DirectoryTaxonomyIndexReader extends DirectoryTaxonomyReader {

Review Comment:
   We've been testing similar approach locally and found that this class should override `DirectoryTaxonomyReader .doOpenIfChanged` method, otherwise after refresh `SearcherAndTaxonomy.taxonomyReader` becomes an instance of `DirectoryTaxonomyReader` again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1321802426


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/ReindexingEnrichedDirectoryTaxonomyWriter.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.function.BiConsumer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+/**
+ * Use this {@link org.apache.lucene.facet.taxonomy.TaxonomyWriter} to append arbitrary fields to
+ * the ordinal documents in the taxonomy. To update the custom data added to the docs, it is
+ * required to {@link #reindexWithNewOrdinalData(BiConsumer)}.
+ */
+public class ReindexingEnrichedDirectoryTaxonomyWriter extends DirectoryTaxonomyWriter {

Review Comment:
   Note that Lucene has updatable doc values, so in theory one could update/add a new doc values field into the taxonomy index without shifting ordinals.  This might be a nice path to update fields associated with ordinals?



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java:
##########
@@ -877,6 +886,26 @@ public synchronized void replaceTaxonomy(Directory taxoDir) throws IOException {
     ++indexEpoch;
   }
 
+  /** Delete all taxonomy data and start over. */
+  public synchronized void reset() throws IOException {

Review Comment:
   Maybe rename to `deleteAll` (matching `IndexWriter.deleteAll`).
   
   And could you add some javadoc warnings / when this might be used?  If you fully delete your taxo index, you must also reindex any documents index referencing these same facets?  Or, you must carefully regenerate the ordinals in the right order?
   
   And perhaps it should be package private, if the intention is to only use it via `ReindexingEnrichedDirectoryTaxonomyWriter`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shaie commented on pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "shaie (via GitHub)" <gi...@apache.org>.
shaie commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1565912761

   To add to the comment I wrote above, I was referring to something like `AssociationFacetField` which today allows indexing an arbitrary `byte[]` with ordinals. A similar field can be used to add other fields as well (numeric, string, stored, ...).
   
   BTW, if all you're interested in at the moment is to associate some _weight_ to a facet, perhaps the existing associations support will already provide what you need?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stefanvodita commented on pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "stefanvodita (via GitHub)" <gi...@apache.org>.
stefanvodita commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1575179188

   Thank you for the great feedback @shaie!
   
   I’ve pushed a commit where I try to move the logic to new taxonomy reader/writer implementations. I’ve added an option to reindex the taxonomy using the new writer, hopefully it should make it clear that the user can’t change their custom ordinal data without reindexing.
   
   We can also try what you were suggesting with payloads getting passed in for each document indexed in the main index if you still think making the `oridnalDataAppender` an attribute of the taxonomy writer is not the right approach.
   
   Regarding association fields - we can use them to achieve the same behaviour, but we can also do it more efficiently. Association fields are per ordinal and per doc, but with this CR we’re getting fields that are per ordinal and across docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1321799297


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This is like a {@link DirectoryTaxonomyReader}, except it provides access to the underlying
+ * {@link DirectoryReader} and full path field name.
+ */
+public class DirectoryTaxonomyIndexReader extends DirectoryTaxonomyReader {

Review Comment:
   Can we open a separate PR to address this issue?  It seems like a crab (a spinoff issue discovered while creating this PR that's fundamentally not otherwise related to this PR)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "mikemccand (via GitHub)" <gi...@apache.org>.
mikemccand commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1714232934

   > But as I think about this feature and how do I see it mature over time, I DO think the payload should be given when ingesting the documents
   
   Hmm -- I don't think that's great because we are forcing denormalization onto the user?  This is fundamentally nicely normalized content (values per `FacetLabel` not `Document`), so we really should enable indexing it in a denormalized manner.  If the user really wants to (inefficiently) denormalize they can use `AssociationFacets` already?
   
   > On the other hand, once a facet was added, e.g. "Author/Bob", its value can't change (yet)
   
   Actually, I remember someone 😉 adding this nice feature long ago to Lucene to be able to update doc values in-place -- it seems like that could be a great mechanism for updating these denormalized `FacetLabel` values.  But we should do that as a followon issue -- let's leave this PR to focus on getting these initial values into the taxo index.
   
   We could also (later, separate PR) consider more radical changes to how the taxo index is stored to make it more efficient to update `FacetLabel` values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stefanvodita commented on pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "stefanvodita (via GitHub)" <gi...@apache.org>.
stefanvodita commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1715512722

   Thank you for the review @mikemccand! I’ve integrated your feedback. Updatable doc values are definitely something to consider.
   For comparison, I coded up an [association facet field that is constant for a facet label across document](https://github.com/apache/lucene/pull/12550). I think it helps highlight the advantages of the enriched taxonomy solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stefanvodita commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

Posted by "stefanvodita (via GitHub)" <gi...@apache.org>.
stefanvodita commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1322872602


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This is like a {@link DirectoryTaxonomyReader}, except it provides access to the underlying
+ * {@link DirectoryReader} and full path field name.
+ */
+public class DirectoryTaxonomyIndexReader extends DirectoryTaxonomyReader {

Review Comment:
   This was more complicated in the first draft, but right now the only change is to make `getInternalIndexReader` public instead of protected. I think we have to have that in this PR, otherwise we’re offering a way to put data in the taxonomy, but no way to get it back out.



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+
+/**
+ * This is like a {@link DirectoryTaxonomyReader}, except it provides access to the underlying
+ * {@link DirectoryReader} and full path field name.
+ */
+public class DirectoryTaxonomyIndexReader extends DirectoryTaxonomyReader {

Review Comment:
   This was more complicated in the first draft, but right now the only change is to make `getInternalIndexReader` public instead of protected. I think we have to have that in this PR, otherwise we’re offering a way to put data in the taxonomy, but no way to get it back out.



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java:
##########
@@ -877,6 +886,26 @@ public synchronized void replaceTaxonomy(Directory taxoDir) throws IOException {
     ++indexEpoch;
   }
 
+  /** Delete all taxonomy data and start over. */
+  public synchronized void reset() throws IOException {

Review Comment:
   Good suggestions! I’ve done this in the new commit.



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java:
##########
@@ -877,6 +886,26 @@ public synchronized void replaceTaxonomy(Directory taxoDir) throws IOException {
     ++indexEpoch;
   }
 
+  /** Delete all taxonomy data and start over. */
+  public synchronized void reset() throws IOException {

Review Comment:
   Good suggestions! I’ve done this in the new commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org