You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nlpcraft.apache.org by ar...@apache.org on 2020/08/23 23:29:17 UTC

[incubator-nlpcraft] branch master updated: WIP on windows scripts & support.

This is an automated email from the ASF dual-hosted git repository.

aradzinski pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft.git


The following commit(s) were added to refs/heads/master by this push:
     new 3d2f332  WIP on windows scripts & support.
3d2f332 is described below

commit 3d2f332049afd33614a92828cca814b20d34d736
Author: Aaron Radzinski <ar...@datalingvo.com>
AuthorDate: Sun Aug 23 16:29:08 2020 -0700

    WIP on windows scripts & support.
---
 nlpcraft/src/main/python/ctxword/README.md         | 14 +++----
 nlpcraft/src/main/python/ctxword/WINDOWS_SETUP.md  | 18 +++++----
 .../src/main/python/ctxword/bin/py_requirements    |  2 +-
 .../src/main/python/ctxword/bin/start_server.cmd   | 26 +++++++++++++
 nlpcraft/src/main/python/ctxword/bin/suggest.sh    |  2 +-
 .../src/main/python/ctxword/bin/suggestion.cmd     | 45 ++++++++++++++++++++++
 6 files changed, 91 insertions(+), 16 deletions(-)

diff --git a/nlpcraft/src/main/python/ctxword/README.md b/nlpcraft/src/main/python/ctxword/README.md
index 4ff0f30..6749e1c 100644
--- a/nlpcraft/src/main/python/ctxword/README.md
+++ b/nlpcraft/src/main/python/ctxword/README.md
@@ -26,21 +26,21 @@
 ### Overview
 `ctxword` module provides Python-based internal tool for finding a contextually related words for a given word from the
 input sentence. This utility provides a single REST endpoint and is based on Google's [BERT](https://github.com/google-research/bert) 
-models and Facebook's [fasttext](https://fasttext.cc/) library.
+models and Facebook's [FastText](https://fasttext.cc/) library.
 
 ### Dependencies
 To install necessary dependency:
  * **Linux/MacOS**: run `bin/install_dependencies.sh` script.  
- * **Windows**: read `WINDOWS_SETUP.md` in the same folder.
+ * **Windows**: read `WINDOWS_SETUP.md` file for manual installation.
 
 ### Start REST Server
 To start REST server:
  * **Linux/MacOS**: run `bin/start_server.sh` script.  
- * **Windows**: read `WINDOWS_SETUP.md` in the same folder.
+ * **Windows**: run `bin\start_server.cmd` script.
  
- NOTE: on the 1st start the server will try to load compressed BERT model which is not yet available. It will
- then download this library and compress it which will take a several minutes and may require 10GB+ of 
- available memory. Subsequent starts will skip this step, and the server will start much faster.
+NOTE: on the 1st start the server will try to load compressed BERT model which is not yet available. It will
+then download this library and compress it which will take a several minutes and may require 10 GB+ of 
+available memory. Subsequent starts will skip this step, and the server will start much faster.
 
 ### REST API
 Once the REST server is started you can issue REST calls to get suggestions for the contextual related words.
@@ -76,7 +76,7 @@ Here's the sample request and response JSON objects:
  * Response JSON:
    - `[["word1", "word2", "word3"]]`
  
-### `bin/suggest.sh`
+### `bin/suggest.{sh|cmd|ps1}`
 You can use Curl-based `bin/suggest.sh` script for the suggestion processing of single sentences from the command line.
 Following call returns list of contextual suggestions for the 5th word (counting from zero) in the given sentence: 
 
diff --git a/nlpcraft/src/main/python/ctxword/WINDOWS_SETUP.md b/nlpcraft/src/main/python/ctxword/WINDOWS_SETUP.md
index 484efb4..b5fe102 100644
--- a/nlpcraft/src/main/python/ctxword/WINDOWS_SETUP.md
+++ b/nlpcraft/src/main/python/ctxword/WINDOWS_SETUP.md
@@ -27,14 +27,18 @@
 To set up `ctxword` module under Windows, you would need to repeat steps from `bin/install_dependencies.sh` script:
  1. Before starting, make sure you have the following installed:
     - `python3`
-    - `pip3` (included in the latest versions of python3)
+    - `pip3` (included with the latest versions of python3)
     - `git`
- 2. Download pre-trained [FastText model](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz)
- 3. Extract archive into `data` folder (i.e. `/nlpcraft/src/main/python/ctxword/data`)
- 4. Clone [FastText repository](https://github.com/facebookresearch/fastText.git)
- 5. Install it with `pip3 install fastText` (where `fastText` is root of the cloned git repository)
- 6. Install the rest of required python packages from `bin/py_requirements` by running `pip3 install -r bin/py_requirements`  
- 7. A local clone of FastTest git repository may be removed after setup is finished.
+ 2. Download pre-trained [FastText](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz) model.
+ 3. Extract GZIP model into `data` folder (i.e. `/nlpcraft/src/main/python/ctxword/data`).
+ 4. Git clone [FastText](https://github.com/facebookresearch/fastText.git) into some temporary folder.
+ 5. Ensure that you have [Microsoft Windows 10 SDK](https://developer.microsoft.com/en-US/windows/downloads/windows-10-sdk/) installed. Step 6. will fail unless this SDK is installed.
+ 6. Run `pip3 install fastText` (where `fastText` is root of the cloned git repository from the previous step).
+ 7. Install PyTorch depending on whether you have NVIDIA CUDA support:
+    - Without CUDA support: `pip3 install torch==1.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html`
+    - With CUDA support: `pip3 install torch==1.6.0 -f https://download.pytorch.org/whl/torch_stable.html`
+ 8. Install the rest of required python packages from `bin/py_requirements` by running `pip3 install -r bin/py_requirements`
+ 9. You can remove the local clone of FastText git repository after its setup is finished.
  
  ### Copyright
  Copyright (C) 2020 Apache Software Foundation
diff --git a/nlpcraft/src/main/python/ctxword/bin/py_requirements b/nlpcraft/src/main/python/ctxword/bin/py_requirements
index d2eb820..bb0c20d 100644
--- a/nlpcraft/src/main/python/ctxword/bin/py_requirements
+++ b/nlpcraft/src/main/python/ctxword/bin/py_requirements
@@ -18,4 +18,4 @@
 # Dependency list for 'ctxword' Python module.
 flask==1.1.2
 transformers==2.7.0
-torch==1.5.0
+torch==1.6.0
diff --git a/nlpcraft/src/main/python/ctxword/bin/start_server.cmd b/nlpcraft/src/main/python/ctxword/bin/start_server.cmd
new file mode 100644
index 0000000..8aa6942
--- /dev/null
+++ b/nlpcraft/src/main/python/ctxword/bin/start_server.cmd
@@ -0,0 +1,26 @@
+@echo OFF
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one or more
+rem contributor license agreements.  See the NOTICE file distributed with
+rem this work for additional information regarding copyright ownership.
+rem The ASF licenses this file to You under the Apache License, Version 2.0
+rem (the "License"); you may not use this file except in compliance with
+rem the License.  You may obtain a copy of the License at
+rem
+rem      http://www.apache.org/licenses/LICENSE-2.0
+rem
+rem Unless required by applicable law or agreed to in writing, software
+rem distributed under the License is distributed on an "AS IS" BASIS,
+rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+rem See the License for the specific language governing permissions and
+rem limitations under the License.
+rem
+rem
+rem NOTE:
+rem ----
+rem This script may not be suitable for production usage. Please see official Flask documentation for
+rem more info on how to deploy Flask applications.
+
+set FLASK_APP=server.py
+python3 -m flask run
\ No newline at end of file
diff --git a/nlpcraft/src/main/python/ctxword/bin/suggest.sh b/nlpcraft/src/main/python/ctxword/bin/suggest.sh
index a45bb24..115a3cd 100755
--- a/nlpcraft/src/main/python/ctxword/bin/suggest.sh
+++ b/nlpcraft/src/main/python/ctxword/bin/suggest.sh
@@ -25,4 +25,4 @@
 # NOTE: You need to have REST server running (see 'start_server.sh' script in the same folder).
 #
 
-curl -d "{\"sentences\": [{\"text\": \"$1\", \"indexes\": [$2]}], \"simple\": true, \"limit\": 10}" -H 'Content-Type: application/json' http://localhost:5000/suggestions
+curl -d "{\"sentences\": [{\"text\": \"$1\", \"indexes\": [$2]}], \"simple\": true, \"limit\": 10}" -H 'Content-Type: application/json' http://localhost:5000/suggestions | python -m json.tool
diff --git a/nlpcraft/src/main/python/ctxword/bin/suggestion.cmd b/nlpcraft/src/main/python/ctxword/bin/suggestion.cmd
new file mode 100644
index 0000000..b26fe6d
--- /dev/null
+++ b/nlpcraft/src/main/python/ctxword/bin/suggestion.cmd
@@ -0,0 +1,45 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one or more
+rem contributor license agreements.  See the NOTICE file distributed with
+rem this work for additional information regarding copyright ownership.
+rem The ASF licenses this file to You under the Apache License, Version 2.0
+rem (the "License"); you may not use this file except in compliance with
+rem the License.  You may obtain a copy of the License at
+rem
+rem      http://www.apache.org/licenses/LICENSE-2.0
+rem
+rem Unless required by applicable law or agreed to in writing, software
+rem distributed under the License is distributed on an "AS IS" BASIS,
+rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+rem See the License for the specific language governing permissions and
+rem limitations under the License.
+rem
+rem
+rem
+rem Simple Curl-based script for getting contextual related words suggestions for a single input sentence.
+rem Example:
+rem     > bin\win\suggest.cmd "what is the chance of rain tomorrow?" 5
+rem       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
+rem                                      Dload  Upload   Total   Spent    Left  Speed
+rem     100   214  100   104  100   110    104    110  0:00:01 --:--:--  0:00:01   804
+rem     [
+rem         [
+rem             "rain",
+rem             "snow",
+rem             "rainfall",
+rem             "precipitation",
+rem             "rains",
+rem             "flooding",
+rem             "storms",
+rem             "raining",
+rem             "sunshine",
+rem             "showers"
+rem         ]
+rem     ]
+rem
+rem NOTE: You need to have REST server running (see 'start_server.{cmd|ps1}' scripts in the same folder).
+rem
+
+curl http://localhost:5000/suggestions -d "{\"sentences\": [{\"text\": \"%~1\", \"indexes\": [%~2]}], \"simple\": true, \"limit\": 10}" -H "Content-Type: application/json" | python -m json.tool
\ No newline at end of file