You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by Matt Post <po...@cs.jhu.edu> on 2016/11/01 16:52:06 UTC

Re: Community Review of New Language Pack

Checking this out now


> On Oct 28, 2016, at 4:25 PM, lewis john mcgibbney <le...@apache.org> wrote:
> 
> Hi Folks,
> I managed to generate my first language pack today based on heiro model.
> It's 4.8GB in size so I have made it available via my home.apache.org
> public space at [0]. Right now it is uploading and will take a wee while.
> I would like some community review so we can review the quality of what has
> been generated. In addition there are a number of immediate things I am
> struggling with.
> 
> Firstly, the following files were not present after running the bundler.py.
> 
>   -  prepare.sh, this is a baseline requirement for running the tests as
>   detailed within the auto-generated README.
>   - the entire 'scripts' directory!!! This means that no utility
>   processing can be undertaken at all.
> 
> I know that both of the above are essential requirements, I therefore added
> them from a different language pack, increased default maximum memory usage
> and also augmented the README with some details regarding the dataset used
> to generate the language pack.
> 
> In comparison to the es --> en language pack posted by Matt, due to the fat
> that no scripts directory was generated, this language pack does not have
> the scripts/release directory either. I am not sure how this was generated.
> 
> Over and above what I've detailed so far, there is one blocking issue for
> me... when I submit Russian text to the Joshua server, it just spits back
> out the same Russian text! I can see the decoder logging to std out however
> I can only assume that no decoding is actually taking place.
> 
> Can you guys please review the language pack, provide feedback on the
> configuration, some of the scores which have been generated and even the
> BLEU score? I have absolutely everything local and also backed up so I can
> provide absolutely everything as well as the exact commands I invoked to
> generate the entire thing from start to finish.
> Cheers troops.
> 
> [0] http://home.apache.org/~lewismc/language-pack-ru-en-2016-10-28.tar.gz
> 
> -- 
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney


Re: Community Review of New Language Pack

Posted by Matt Post <po...@cs.jhu.edu>.
Lewis, can I get an MD5 or SHA1 checksum? I'm getting errors unpacking.

I do see that you built the LP with the old scripts. I'll write up instructions on how to do it with the new set.

matt


> On Nov 1, 2016, at 12:52 PM, Matt Post <po...@cs.jhu.edu> wrote:
> 
> Checking this out now
> 
> 
>> On Oct 28, 2016, at 4:25 PM, lewis john mcgibbney <le...@apache.org> wrote:
>> 
>> Hi Folks,
>> I managed to generate my first language pack today based on heiro model.
>> It's 4.8GB in size so I have made it available via my home.apache.org
>> public space at [0]. Right now it is uploading and will take a wee while.
>> I would like some community review so we can review the quality of what has
>> been generated. In addition there are a number of immediate things I am
>> struggling with.
>> 
>> Firstly, the following files were not present after running the bundler.py.
>> 
>>  -  prepare.sh, this is a baseline requirement for running the tests as
>>  detailed within the auto-generated README.
>>  - the entire 'scripts' directory!!! This means that no utility
>>  processing can be undertaken at all.
>> 
>> I know that both of the above are essential requirements, I therefore added
>> them from a different language pack, increased default maximum memory usage
>> and also augmented the README with some details regarding the dataset used
>> to generate the language pack.
>> 
>> In comparison to the es --> en language pack posted by Matt, due to the fat
>> that no scripts directory was generated, this language pack does not have
>> the scripts/release directory either. I am not sure how this was generated.
>> 
>> Over and above what I've detailed so far, there is one blocking issue for
>> me... when I submit Russian text to the Joshua server, it just spits back
>> out the same Russian text! I can see the decoder logging to std out however
>> I can only assume that no decoding is actually taking place.
>> 
>> Can you guys please review the language pack, provide feedback on the
>> configuration, some of the scores which have been generated and even the
>> BLEU score? I have absolutely everything local and also backed up so I can
>> provide absolutely everything as well as the exact commands I invoked to
>> generate the entire thing from start to finish.
>> Cheers troops.
>> 
>> [0] http://home.apache.org/~lewismc/language-pack-ru-en-2016-10-28.tar.gz
>> 
>> -- 
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>