You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/30 06:25:43 UTC

[GitHub] piiswrong closed pull request #9626: Fix skipping error in docstr and API docs

piiswrong closed pull request #9626: Fix skipping error in docstr and API docs
URL: https://github.com/apache/incubator-mxnet/pull/9626
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/api/python/contrib/text.md b/docs/api/python/contrib/text.md
index f203a117ba..8bd67d2b50 100644
--- a/docs/api/python/contrib/text.md
+++ b/docs/api/python/contrib/text.md
@@ -138,11 +138,11 @@ data set.
 
 The obtained `counter` has key-value pairs whose keys are words and values are word frequencies.
 Suppose that we want to build indices for the 2 most frequent keys in `counter` with the unknown
-token representation '<UnK>' and a reserved token '<pad>'.
+token representation '&lt;unk&gt;' and a reserved token '&lt;pad&gt;'.
 
 ```python
->>> my_vocab = text.vocab.Vocabulary(counter, most_freq_count=2, unknown_token='<UnK>', 
-...     reserved_tokens=['<pad>'])
+>>> my_vocab = text.vocab.Vocabulary(counter, most_freq_count=2, unknown_token='&lt;unk&gt;', 
+...     reserved_tokens=['&lt;pad&gt;'])
 
 ```
 
@@ -153,18 +153,18 @@ of any unknown token) and `reserved_tokens`.
 
 ```python
 >>> my_vocab.token_to_idx
-{'<UnK>': 0, '<pad>': 1, 'world': 2, 'hello': 3}
+{'&lt;unk&gt;': 0, '&lt;pad&gt;': 1, 'world': 2, 'hello': 3}
 >>> my_vocab.idx_to_token
-['<UnK>', '<pad>', 'world', 'hello']
+['&lt;unk&gt;', '&lt;pad&gt;', 'world', 'hello']
 >>> my_vocab.unknown_token
-'<UnK>'
+'&lt;unk&gt;'
 >>> my_vocab.reserved_tokens
-['<pad>']
+['&lt;pad&gt;']
 >>> len(my_vocab)
 4
 ```
 
-Besides the specified unknown token '<UnK>' and reserved_token '<pad>' are indexed, the 2 most
+Besides the specified unknown token '&lt;unk&gt;' and reserved_token '&lt;pad&gt;' are indexed, the 2 most
 frequent words 'world' and 'hello' are also indexed.
 
 
@@ -259,9 +259,9 @@ We can also access properties such as `token_to_idx` (mapping tokens to indices)
 
 ```python
 >>> my_embedding.token_to_idx
-{'<unk>': 0, 'world': 1, 'hello': 2}
+{'&lt;unk&gt;': 0, 'world': 1, 'hello': 2}
 >>> my_embedding.idx_to_token
-['<unk>', 'world', 'hello']
+['&lt;unk&gt;', 'world', 'hello']
 >>> len(my_embedding)
 3
 >>> my_embedding.vec_len
@@ -302,7 +302,7 @@ word embedding file, we do not need to specify any vocabulary.
 
 We can access properties such as `token_to_idx` (mapping tokens to indices), `idx_to_token` (mapping
 indices to tokens), `vec_len` (length of each embedding vector), and `unknown_token` (representation
-of any unknown token, default value is '<unk>').
+of any unknown token, default value is '&lt;unk&gt;').
 
 ```python
 >>> my_embedding.token_to_idx['nice']
@@ -312,15 +312,15 @@ of any unknown token, default value is '<unk>').
 >>> my_embedding.vec_len
 300
 >>> my_embedding.unknown_token
-'<unk>'
+'&lt;unk&gt;'
 
 ```
 
-For every unknown token, if its representation '<unk>' is encountered in the pre-trained token
+For every unknown token, if its representation '&lt;unk&gt;' is encountered in the pre-trained token
 embedding file, index 0 of property `idx_to_vec` maps to the pre-trained token embedding vector
 loaded from the file; otherwise, index 0 of property `idx_to_vec` maps to the default token
 embedding vector specified via `init_unknown_vec` (set to nd.zeros here). Since the pre-trained file
-does not have a vector for the token '<unk>', index 0 has to map to an additional token '<unk>' and
+does not have a vector for the token '&lt;unk&gt;', index 0 has to map to an additional token '&lt;unk&gt;' and
 the number of tokens in the embedding is 111,052.
 
 
diff --git a/python/mxnet/contrib/text/embedding.py b/python/mxnet/contrib/text/embedding.py
index 4fc6aacf67..961fbb02a8 100644
--- a/python/mxnet/contrib/text/embedding.py
+++ b/python/mxnet/contrib/text/embedding.py
@@ -646,12 +646,12 @@ class CustomEmbedding(_TokenEmbedding):
 
     This is to load embedding vectors from a user-defined pre-trained text embedding file.
 
-    Denote by '<ed>' the argument `elem_delim`. Denote by <v_ij> the j-th element of the token
-    embedding vector for <token_i>, the expected format of a custom pre-trained token embedding file
+    Denote by '[ed]' the argument `elem_delim`. Denote by [v_ij] the j-th element of the token
+    embedding vector for [token_i], the expected format of a custom pre-trained token embedding file
     is:
 
-    '<token_1><ed><v_11><ed><v_12><ed>...<ed><v_1k>\\\\n<token_2><ed><v_21><ed><v_22><ed>...<ed>
-    <v_2k>\\\\n...'
+    '[token_1][ed][v_11][ed][v_12][ed]...[ed][v_1k]\\\\n[token_2][ed][v_21][ed][v_22][ed]...[ed]
+    [v_2k]\\\\n...'
 
     where k is the length of the embedding vector `vec_len`.
 
diff --git a/python/mxnet/contrib/text/vocab.py b/python/mxnet/contrib/text/vocab.py
index 04c3326841..9e44acb101 100644
--- a/python/mxnet/contrib/text/vocab.py
+++ b/python/mxnet/contrib/text/vocab.py
@@ -52,7 +52,7 @@ class Vocabulary(object):
         argument has no effect.
     min_freq : int, default 1
         The minimum frequency required for a token in the keys of `counter` to be indexed.
-    unknown_token : hashable object, default '<unk>'
+    unknown_token : hashable object, default '&lt;unk&gt;'
         The representation for any unknown token. In other words, any unknown token will be indexed
         as the same representation. Keys of `counter`, `unknown_token`, and values of
         `reserved_tokens` must be of the same hashable type. Examples: str, int, and tuple.


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services