You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/22 08:23:52 UTC
[GitHub] [lucene] jpountz commented on issue #11799: Indexing method for learned sparse retrieval
jpountz commented on issue #11799:
URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254691652
> we want a single Field containing a list of key-value pairs or a json formatted
Note that you can add one `FeatureField` field to your Lucene document for every key/value pair in your JSON document. The logic of converting from a high-level representation like a JSON map into a low-level representation that Lucene understands feels like something that could be managed on the application side?
Here's a code example that I think does something similar to what you are looking for:
```java
import org.apache.lucene.document.Document;
import org.apache.lucene.document.FeatureField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.ByteBuffersDirectory;
import org.apache.lucene.store.Directory;
public class LearnedSparseRetrieval {
public static void main(String[] args) throws Exception {
try (Directory dir = new ByteBuffersDirectory()) {
try (IndexWriter w = new IndexWriter(dir, new IndexWriterConfig())) {
{
Document doc = new Document();
doc.add(new FeatureField("my_feature", "scientific", 200));
doc.add(new FeatureField("my_feature", "intellect", 202));
doc.add(new FeatureField("my_feature", "communication", 235));
w.addDocument(doc);
}
{
Document doc = new Document();
doc.add(new FeatureField("my_feature", "scientific", 100));
doc.add(new FeatureField("my_feature", "communication", 350));
doc.add(new FeatureField("my_feature", "project", 80));
w.addDocument(doc);
}
}
try (IndexReader reader = DirectoryReader.open(dir)) {
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new BooleanQuery.Builder()
.add(FeatureField.newLinearQuery("my_feature", "scientific", 24), Occur.SHOULD)
.add(FeatureField.newLinearQuery("my_feature", "communication", 50), Occur.SHOULD)
.build();
System.out.println(searcher.explain(query, 0));
System.out.println();
System.out.println(searcher.explain(query, 0));
}
}
}
}
```
which outputs
```
16550.0 = sum of:
4800.0 = Linear function on the my_feature field for the scientific feature, computed as w * S from:
24.0 = w, weight of this function
200.0 = S, feature value
11750.0 = Linear function on the my_feature field for the communication feature, computed as w * S from:
50.0 = w, weight of this function
235.0 = S, feature value
19900.0 = sum of:
2400.0 = Linear function on the my_feature field for the scientific feature, computed as w * S from:
24.0 = w, weight of this function
100.0 = S, feature value
17500.0 = Linear function on the my_feature field for the communication feature, computed as w * S from:
50.0 = w, weight of this function
350.0 = S, feature value
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org