You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/22 08:23:52 UTC

[GitHub] [lucene] jpountz commented on issue #11799: Indexing method for learned sparse retrieval

jpountz commented on issue #11799:
URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254691652

   > we want a single Field containing a list of key-value pairs or a json formatted
   
   Note that you can add one `FeatureField` field to your Lucene document for every key/value pair in your JSON document. The logic of converting from a high-level representation like a JSON map into a low-level representation that Lucene understands feels like something that could be managed on the application side?
   
   Here's a code example that I think does something similar to what you are looking for:
   
   ```java
   import org.apache.lucene.document.Document;
   import org.apache.lucene.document.FeatureField;
   import org.apache.lucene.index.DirectoryReader;
   import org.apache.lucene.index.IndexReader;
   import org.apache.lucene.index.IndexWriter;
   import org.apache.lucene.index.IndexWriterConfig;
   import org.apache.lucene.search.BooleanClause.Occur;
   import org.apache.lucene.search.BooleanQuery;
   import org.apache.lucene.search.IndexSearcher;
   import org.apache.lucene.search.Query;
   import org.apache.lucene.store.ByteBuffersDirectory;
   import org.apache.lucene.store.Directory;
   
   public class LearnedSparseRetrieval {
   
     public static void main(String[] args) throws Exception {
       try (Directory dir = new ByteBuffersDirectory()) {
         try (IndexWriter w = new IndexWriter(dir, new IndexWriterConfig())) {
           {
             Document doc = new Document();
             doc.add(new FeatureField("my_feature", "scientific", 200));
             doc.add(new FeatureField("my_feature", "intellect", 202));
             doc.add(new FeatureField("my_feature", "communication", 235));
             w.addDocument(doc);
           }
           {
             Document doc = new Document();
             doc.add(new FeatureField("my_feature", "scientific", 100));
             doc.add(new FeatureField("my_feature", "communication", 350));
             doc.add(new FeatureField("my_feature", "project", 80));
             w.addDocument(doc);
           }
         }
   
         try (IndexReader reader = DirectoryReader.open(dir)) {
           IndexSearcher searcher = new IndexSearcher(reader);
           Query query = new BooleanQuery.Builder()
               .add(FeatureField.newLinearQuery("my_feature", "scientific", 24), Occur.SHOULD)
               .add(FeatureField.newLinearQuery("my_feature", "communication", 50), Occur.SHOULD)
               .build();
           System.out.println(searcher.explain(query, 0));
           System.out.println();
           System.out.println(searcher.explain(query, 0));
         }
       }
     }
   
   }
   ```
   
   which outputs
   
   ```
   16550.0 = sum of:
     4800.0 = Linear function on the my_feature field for the scientific feature, computed as w * S from:
       24.0 = w, weight of this function
       200.0 = S, feature value
     11750.0 = Linear function on the my_feature field for the communication feature, computed as w * S from:
       50.0 = w, weight of this function
       235.0 = S, feature value
   
   
   19900.0 = sum of:
     2400.0 = Linear function on the my_feature field for the scientific feature, computed as w * S from:
       24.0 = w, weight of this function
       100.0 = S, feature value
     17500.0 = Linear function on the my_feature field for the communication feature, computed as w * S from:
       50.0 = w, weight of this function
       350.0 = S, feature value
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org