You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "kinow (via GitHub)" <gi...@apache.org> on 2023/03/27 13:56:01 UTC

[GitHub] [opennlp] kinow commented on a diff in pull request #523: OPENNLP-1442: Sentence transformers

kinow commented on code in PR #523:
URL: https://github.com/apache/opennlp/pull/523#discussion_r1149281568


##########
opennlp-dl/src/main/java/opennlp/dl/doccat/DocumentCategorizerDL.java:
##########
@@ -223,41 +214,14 @@ private int getKey(String value) {
 
   }
 
-  /**
-   * Loads a vocabulary file from disk.
-   * @param vocab The vocabulary file.
-   * @return A map of vocabulary words to integer IDs.
-   * @throws IOException Thrown if the vocabulary file cannot be opened and read.
-   */
-  private Map<String, Integer> loadVocab(File vocab) throws IOException {
-
-    final Map<String, Integer> v = new HashMap<>();
-
-    BufferedReader br = new BufferedReader(new FileReader(vocab.getPath()));
-    String line = br.readLine();
-    int x = 0;
-
-    while (line != null) {
-
-      line = br.readLine();
-      x++;
-
-      v.put(line, x);
-
-    }
-
-    return v;
-
-  }
-

Review Comment:
   Nice simplification :+1: !



##########
opennlp-dl/README.md:
##########
@@ -4,44 +4,50 @@ This module provides OpenNLP interface implementations for ONNX models using the
 
 **Important**: This does not provide the ability to train models. Model training is done outside of OpenNLP. This code provides the ability to use ONNX models from OpenNLP.
 
-To build with example models, download the models to the `/src/test/resources` directory. (These are the exported models described below.)
+Models used in the tests are available in the opennlp evaluation test data.
 
-```
-
-export OPENNLP_DATA=/tmp/
-mkdir /tmp/dl-doccat /tmp/dl-namefinder
+## NameFinderDL
 
-# Document categorizer model
-wget https://www.dropbox.com/s/n9uzs8r4xm9rhxb/model.onnx?dl=0 -O $OPENNLP_DATA/dl-doccat/model.onnx
-wget https://www.dropbox.com/s/aw6yjc68jw0jts6/vocab.txt?dl=0 -O $OPENNLP_DATA/dl-doccat/vocab.txt
+* Export a Huggingface NER model to ONNX, e.g.:
 
-# Namefinder model
-wget https://www.dropbox.com/s/zgogq65gs9tyfm1/model.onnx?dl=0 -O $OPENNLP_DATA/dl-namefinder/model.onnx
-wget https://www.dropbox.com/s/3byt1jggly1dg98/vocab.txt?dl=0 -O $OPENNLP_DATA/dl-/namefinder/vocab.txt
+```
+python -m transformers.onnx --model=dslim/bert-base-NER --feature token-classification exported
 ```
 
-## TokenNameFinder
+## DocumentCategorizerDL
 
-* Export a Huggingface NER model to ONNX, e.g.:
+* Export a Huggingface classification (e.g. sentiment) model to ONNX, e.g.:
 
 ```
-python -m transformers.onnx --model=dslim/bert-base-NER --feature token-classification exported
+python -m transformers.onnx --model=nlptown/bert-base-multilingual-uncased-sentiment --feature sequence-classification exported
 ```
 
-* Copy the exported model to `src/test/resources/namefinder/model.onnx`.
-* Copy the model's [vocab.txt](https://huggingface.co/dslim/bert-base-NER/tree/main) to `src/test/resources/namefinder/vocab.txt`.
+## SentenceVectors
 
-Now you can run the tests in `NameFinderDLTest`.
+* Convert a sentence vectors model to ONNX, e.g.:

Review Comment:
   Maybe the GitHub UI is confusing me, but was the `* ` intentional here? I'm seeing an H2, then this list item, but then after that I see paragraphs with "Install dependencies:", "Convert the model"... or were those supposed to be list items too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org