You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jo...@apache.org on 2018/12/12 15:55:44 UTC
[opennlp-sandbox] branch word_dropout created (now 2105a05)
This is an automated email from the ASF dual-hosted git repository.
joern pushed a change to branch word_dropout
in repository https://gitbox.apache.org/repos/asf/opennlp-sandbox.git.
at 2105a05 Add word dropout, tokens are replaced with __UNK__ token
This branch includes the following new commits:
new 2105a05 Add word dropout, tokens are replaced with __UNK__ token
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
[opennlp-sandbox] 01/01: Add word dropout,
tokens are replaced with __UNK__ token
Posted by jo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
joern pushed a commit to branch word_dropout
in repository https://gitbox.apache.org/repos/asf/opennlp-sandbox.git
commit 2105a0509eaf0e17069a551ab48e14b62f92b095
Author: Jörn Kottmann <jo...@apache.org>
AuthorDate: Wed Dec 12 16:55:21 2018 +0100
Add word dropout, tokens are replaced with __UNK__ token
---
tf-ner-poc/src/main/python/namefinder/namefinder.py | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tf-ner-poc/src/main/python/namefinder/namefinder.py b/tf-ner-poc/src/main/python/namefinder/namefinder.py
index 9150bd1..f41fd7d 100644
--- a/tf-ner-poc/src/main/python/namefinder/namefinder.py
+++ b/tf-ner-poc/src/main/python/namefinder/namefinder.py
@@ -19,7 +19,7 @@
# This poc is based on source code taken from:
# https://github.com/guillaumegenthial/sequence_tagging
-
+import random
import sys
from math import floor
import tensorflow as tf
@@ -396,6 +396,12 @@ def main():
sentences_batch, chars_batch, word_length_batch, labels_batch, lengths = \
name_finder.mini_batch(rev_word_dict, char_dict, sentences, labels, batch_size, batch_index)
+ # TODO: Add a parameter to disable/enable this ?!?!
+ for batch_row in range(batch_size):
+ for token_index in range(lengths[batch_row]):
+ if random.uniform(0, 1) <= 0.05:
+ sentences_batch[batch_row][token_index] = word_dict['__UNK__']
+
feed_dict = {token_ids_ph: sentences_batch, char_ids_ph: chars_batch, word_lengths_ph: word_length_batch, sequence_lengths_ph: lengths,
labels_ph: labels_batch, dropout_keep_prob: 0.5}