You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@joshua.apache.org by Matt Post <po...@cs.jhu.edu> on 2017/03/03 16:24:45 UTC

Re: Dockerhub hosted images

Folks,

I've updated the code with a few changes that will support Dockerized language packs. The nice thing is that this makes it easy to include KenLM.

Here are some changes that were made:

- Joshua now notes what directory the config file was found in and loads relative paths found in the config file relative to that directory automatically. This means you don't have to "cd" to the LP (language pack) directory before running Joshua.

- I fixed the HTTP server to take multiple "q=" lines, just like the Google translate API. Before, they only took one "q=" line. This should mean (I'll test later today) that the HTTP server can handle throughput essentially at the rates of the TCP server.

- I added (but haven't pushed yet) the KenLM model files to the language packs. In addition, I added a file "joshua.config.kenlm". These are not used except by Docker.

- I fixed the docker setup. See the new file:

	https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile <https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile>

This docker container builds KenLM. It then expects to be run with docker mounting an existing language pack to /model. It then runs the joshua.config.kenlm file, running it as a server in HTTP mode. See the README file for information:

	https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm <https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm>

If anyone wants to test this out, please do. You can grab an updated language pack (version 3) here:

	http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz <http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz>

(Warning: 9 GB)

matt


> On Nov 23, 2016, at 10:14 AM, kellen sunderland <ke...@gmail.com> wrote:
> 
> Yeah it should just be docker 'pull kellens/apache-joshua-es-en-2016-10-05'
> then 'docker run -it kellens/apache-joshua-es-en-2016-10-05 /bin/bash' or
> something similar.  I think the default command should eventually be to run
> the http server, so ideally we'd just do 'docker run -p 5674
> kellens/apache-joshua-es-en-2016-10-05' and that would start up the http
> server on port 5674.
> 
> Good point on Perl + Python, I can add them.
> 
> -Kellen
> 
> On Wed, Nov 23, 2016 at 3:22 PM, Matt Post <po...@cs.jhu.edu> wrote:
> 
>> Okay, I have this with
>> 
>>        docker run -it kellens/apache-joshua-es-en-2016-10-05 bash
>> 
>> It seems we are missing Perl (./prepare.sh fails), and we should replace
>> the LanguageModel line with a KenLM instance and build that. I bet we'll
>> need Python, too.
>> 
>> 
>> 
>> 
>>> On Nov 23, 2016, at 8:15 AM, Matt Post <po...@cs.jhu.edu> wrote:
>>> 
>>> Kellen, can I bother you to post a few first steps? I've successfully
>> pulled this down to my mac but now do not know how to find it, edit it, or
>> run it. I'm porting through the documentation and will find it eventually
>> but this would save me a bit of time.
>>> 
>>> 
>>>> On Nov 23, 2016, at 8:07 AM, kellen sunderland <
>> kellen.sunderland@gmail.com> wrote:
>>>> 
>>>> Yes my next step was going to be getting it hosted officially.
>>>> 
>>>> I'll go ahead and open a ticket.  I think I'll hold off on pushing to
>> the
>>>> Apache account until I've done a little more testing though.
>>>> 
>>>> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney" <le...@apache.org>
>> wrote:
>>>> 
>>>>> Hi Kellen,
>>>>> Nice :)
>>>>> Another option is for us to host these via the Apache account.
>>>>> https://hub.docker.com/r/apache/
>>>>> We could then add a badge to our README which points to the
>> Dockerfile(s).
>>>>> Do you want to open a ticket over on the INFRA Jira for this?
>>>>> 
>>>>> On Tue, Nov 22, 2016 at 1:57 PM, <
>>>>> dev-digest-help@joshua.incubator.apache.org> wrote:
>>>>> 
>>>>>> From: kellen sunderland <ke...@gmail.com>
>>>>>> To: "dev@joshua.incubator.apache.org" <dev@joshua.incubator.apache.
>> org>
>>>>>> Cc:
>>>>>> Date: Tue, 22 Nov 2016 22:56:56 +0100
>>>>>> Subject: Re: Dockerhub hosted images
>>>>>> Ok, the first image should be properly uploaded now.
>>>>>> 
>>>>>> https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
>>>>>> 
>>>>>> -Kellen
>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>>

Re: Dockerhub hosted images

Posted by Matt Post <po...@cs.jhu.edu>.

FYI, I stress-tested the Joshua server with the following protocol: for both the TCP and HTTP servers, I started a six-thread server, and then sent five simultaneous 16k documents at each. The translation times were as follows:

TCP: (times: 8:07 8:06 8:06)

	for x in 1 2 3 4; do for num in $(seq 1 5); do cat corpus.es | nc localhost 5674 > t.tcp.$num & done; time wait; done)

HTTP: (times: 7:25 7:34 7:20)

	for x in 1 2 3 4; do for num in $(seq 1 5); do /home/hltcoe/mpost/code/joshua/scripts/support/query_http.py -s localhost -p 5674 corpus.es > t.out.$num & done; time wait; done

The HTTP query takes 100 lines of the test set at a time, constructs the RESTful query string (with 100 url-encoded "q=..." lines), and sends it to the server.

So the bottom line is that the HTTP server both has an extended Google-translate API (which also supports other things like adding rules) and is a bit faster.

I'm documenting the RESTful API here: https://cwiki.apache.org/confluence/display/JOSHUA/RESTful+API

matt


> On Mar 3, 2017, at 11:24 AM, Matt Post <po...@cs.jhu.edu> wrote:
> 
> Folks,
> 
> I've updated the code with a few changes that will support Dockerized language packs. The nice thing is that this makes it easy to include KenLM.
> 
> Here are some changes that were made:
> 
> - Joshua now notes what directory the config file was found in and loads relative paths found in the config file relative to that directory automatically. This means you don't have to "cd" to the LP (language pack) directory before running Joshua.
> 
> - I fixed the HTTP server to take multiple "q=" lines, just like the Google translate API. Before, they only took one "q=" line. This should mean (I'll test later today) that the HTTP server can handle throughput essentially at the rates of the TCP server.
> 
> - I added (but haven't pushed yet) the KenLM model files to the language packs. In addition, I added a file "joshua.config.kenlm". These are not used except by Docker.
> 
> - I fixed the docker setup. See the new file:
> 
> 	https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile <https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile>
> 
> This docker container builds KenLM. It then expects to be run with docker mounting an existing language pack to /model. It then runs the joshua.config.kenlm file, running it as a server in HTTP mode. See the README file for information:
> 
> 	https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm <https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm>
> 
> If anyone wants to test this out, please do. You can grab an updated language pack (version 3) here:
> 
> 	http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz <http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz>
> 
> (Warning: 9 GB)
> 
> matt
> 
> 
>> On Nov 23, 2016, at 10:14 AM, kellen sunderland <ke...@gmail.com> wrote:
>> 
>> Yeah it should just be docker 'pull kellens/apache-joshua-es-en-2016-10-05'
>> then 'docker run -it kellens/apache-joshua-es-en-2016-10-05 /bin/bash' or
>> something similar.  I think the default command should eventually be to run
>> the http server, so ideally we'd just do 'docker run -p 5674
>> kellens/apache-joshua-es-en-2016-10-05' and that would start up the http
>> server on port 5674.
>> 
>> Good point on Perl + Python, I can add them.
>> 
>> -Kellen
>> 
>> On Wed, Nov 23, 2016 at 3:22 PM, Matt Post <po...@cs.jhu.edu> wrote:
>> 
>>> Okay, I have this with
>>> 
>>>       docker run -it kellens/apache-joshua-es-en-2016-10-05 bash
>>> 
>>> It seems we are missing Perl (./prepare.sh fails), and we should replace
>>> the LanguageModel line with a KenLM instance and build that. I bet we'll
>>> need Python, too.
>>> 
>>> 
>>> 
>>> 
>>>> On Nov 23, 2016, at 8:15 AM, Matt Post <po...@cs.jhu.edu> wrote:
>>>> 
>>>> Kellen, can I bother you to post a few first steps? I've successfully
>>> pulled this down to my mac but now do not know how to find it, edit it, or
>>> run it. I'm porting through the documentation and will find it eventually
>>> but this would save me a bit of time.
>>>> 
>>>> 
>>>>> On Nov 23, 2016, at 8:07 AM, kellen sunderland <
>>> kellen.sunderland@gmail.com> wrote:
>>>>> 
>>>>> Yes my next step was going to be getting it hosted officially.
>>>>> 
>>>>> I'll go ahead and open a ticket.  I think I'll hold off on pushing to
>>> the
>>>>> Apache account until I've done a little more testing though.
>>>>> 
>>>>> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney" <le...@apache.org>
>>> wrote:
>>>>> 
>>>>>> Hi Kellen,
>>>>>> Nice :)
>>>>>> Another option is for us to host these via the Apache account.
>>>>>> https://hub.docker.com/r/apache/
>>>>>> We could then add a badge to our README which points to the
>>> Dockerfile(s).
>>>>>> Do you want to open a ticket over on the INFRA Jira for this?
>>>>>> 
>>>>>> On Tue, Nov 22, 2016 at 1:57 PM, <
>>>>>> dev-digest-help@joshua.incubator.apache.org> wrote:
>>>>>> 
>>>>>>> From: kellen sunderland <ke...@gmail.com>
>>>>>>> To: "dev@joshua.incubator.apache.org" <dev@joshua.incubator.apache.
>>> org>
>>>>>>> Cc:
>>>>>>> Date: Tue, 22 Nov 2016 22:56:56 +0100
>>>>>>> Subject: Re: Dockerhub hosted images
>>>>>>> Ok, the first image should be properly uploaded now.
>>>>>>> 
>>>>>>> https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
>>>>>>> 
>>>>>>> -Kellen
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>