You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by KellenSunderland <gi...@git.apache.org> on 2016/09/28 09:58:33 UTC

[GitHub] incubator-joshua pull request #68: KenLM JNI optimization

GitHub user KellenSunderland opened a pull request:

    https://github.com/apache/incubator-joshua/pull/68

    KenLM JNI optimization

    I've cleaned up the code from the previous PR (#65) that had the @kpu's perf  increase.  I've removed the code that was specific to the estimate call as we didn't see a performance increase with it.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KellenSunderland/incubator-joshua DirectBuffersRemoveEst

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-joshua/pull/68.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #68
    
----
commit 9ea7eebf0164d1676f633b441bd952eaa20b0760
Author: Kellen Sunderland <ke...@amazon.com>
Date:   2016-09-15T17:06:04Z

    Convert to a DirectBuffer to transfer ngrams during probRule

commit c8d8a65b9352e51e777965994dae7f9337b08def
Author: Kellen Sunderland <ke...@amazon.com>
Date:   2016-09-15T17:31:21Z

    Converted estimateRule to also make use of DirectBuffer.
    Reduced number of array copies in probRule.
    Removed sentence from estimate method signature (as it was unused).
    Created an abstraction in the KenLMPool class to hide details of underlying ByteBuffer Indexing.
    Fixed Test givenKenLm_whenQueryingWithState_thenStateAndProbReturned

commit d9c3d7ecf069a6a0339b911b9defb8ce31ebb1f1
Author: Kellen Sunderland <ke...@amazon.com>
Date:   2016-09-27T15:31:37Z

    Remove uneeded modifications for estimate in KenLM

commit e9f4f5b1468364a658f90c168e2b8ec69c3fa48e
Author: Kellen Sunderland <ke...@amazon.com>
Date:   2016-09-27T16:29:50Z

    Explicitly bind KenLMs to LmPool objects

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-joshua pull request #68: KenLM JNI optimization

Posted by kpu <gi...@git.apache.org>.
Github user kpu commented on a diff in the pull request:

    https://github.com/apache/incubator-joshua/pull/68#discussion_r80882244
  
    --- Diff: jni/kenlm_wrap.cc ---
    @@ -76,15 +76,16 @@ class EqualIndex : public std::binary_function<StateIndex, StateIndex, bool> {
     typedef std::unordered_set<StateIndex, HashIndex, EqualIndex> Lookup;
     
     /**
    - * A Chart bundles together a unordered_multimap that maps ChartState signatures to a single
    - * object instantiated using a pool. This allows duplicate states to avoid allocating separate
    - * state objects at multiple places throughout a sentence, and also allows state to be shared
    - * across KenLMs for the same sentence.  Multimap is used to avoid hash collisions which can
    - * return incorrect results, and cause out-of-bounds lookups when multiple KenLMs are in use.
    + * A Chart bundles together a vector holding CharStates and an unordered_set of StateIndexes
    + * which provides a mapping between StateIndexes and the positions of ChartStates in the vector.
    + * This allows for duplicate states to avoid allocating separate state objects at multiple places
    + * throughout a sentence.
      */
     class Chart {
       public:
    -    Chart() : lookup_(1000, HashIndex(vec_), EqualIndex(vec_)) {}
    +    Chart(long* ngramBuffer) : 
    --- End diff --
    
    Good practice to use explicit with single-argument constructors.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-joshua pull request #68: KenLM JNI optimization

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-joshua/pull/68


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---