You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Jeff Zemerick <jz...@apache.org> on 2021/03/15 22:26:14 UTC

Preparing for UD 1.0 Models Vote

Before starting a second release vote thread for the OpenNLP models and
since this is the first release of pretrained OpenNLP models, I would like
to pause to solicit feedback from the community in regards to the release
configuration.

- The files are staged on the ASF dev SVN at
https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/.
- The model files are signed and hashed.
- Includes README, CHANGES, NOTICE, and LICENSE files.
- The training and evaluation outputs are in the training-eval-logs.zip
file (also signed and hashed).

Please let me know if anything is missing or should be changed. Once things
are in a good state I will make a PR to document the steps on the website
(OPENNLP-1328) and start a vote thread.

ASF Release Creation Process:
https://infra.apache.org/release-publishing.html

Thanks,
Jeff

Re: Preparing for UD 1.0 Models Vote

Posted by Joey Frazee <jo...@icloud.com.INVALID>.

That all makes sense to me. Voting for the models does seem prudent for the reasons you mention — these are sort of new problems to have and I don’t think it’s very well established what the right answer is so validating consensus even if not obligated makes sense.

-joey

> On Mar 16, 2021, at 4:08 AM, Jeff Zemerick <jz...@apache.org> wrote:
> 
> Joey, thanks for the comments/questions.
> 
> - The models can be named along with the 1.9.3 version to show that's what
> trained it but we should be careful to not give the impression that the
> models *only* work with that version. I think that can be made sufficiently
> clear in the documentation.
> - My opinion is that it would be best if the models were not tied to the
> OpenNLP lifecycle. I would like for the project to be able to release new
> models independently of OpenNLP releases. So I hope we can do the latter
> and train the models from an official OpenNLP release, vote, and publish.
> - I feel that the models fall somewhere in between being a direct binary
> artifact and something more derivative like a Docker container because a
> model needs to be evaluated prior to being made available. Contrast that
> with a docker container which either works or doesn't. More things (how it
> was trained, performance, etc.) should be considered when voting on a model
> release than just if it works or not.
> 
> From that page https://incubator.apache.org/guides/distribution.html:
> 
> - Convenience binaries must be made from IPMC approved ASF releases.
> - Convenience binaries need to follow licensing policy and not include any
> category X licensed software.
> - Convenience binaries should be signed and have hashes to verify their
> contents.
> 
> I think we are ok with those 3 things. I will update the naming of the
> models as Joey suggested (to include the OpenNLP version that created them)
> and update the README to explain 1.9.3 is the version that created them but
> should work with all OpenNLP versions (but only tested with 1.9.3).
> 
> Are there any concerns about the model release process given my responses
> to your questions?
> 
> Thanks,
> Jeff
> 
> 
> 
> 
>> On Mon, Mar 15, 2021 at 7:49 PM Joey Frazee <jo...@icloud.com.invalid>
>> wrote:
>> 
>> Jeff, in the other thread you mentioned “I personally have been thinking
>> of the models as convenience binaries”.
>> 
>> I think that’s the most obvious answer and is what I’d think too.
>> 
>> If that’s the case, then the policies suggest that the version needs to
>> match the version they’re created from. So should this be something like
>> opennlp-ud-models-1.0-1.9.3 or similar?
>> 
>> The other thing, which is murky in practice, is that do the models need to
>> be voted on concurrently with a release or just created by the PMC from an
>> official release and published on Apache supported infrastructure?
>> 
>> Direct binary artifacts are almost always evaluated at the time of a
>> release vote but more derivative ones often aren’t. E.g., a lot of projects
>> publish Docker images from approved releases but not with an independent
>> vote. Which are these?
>> 
>> Incubator recently published some helpful guidelines which clarify related
>> stuff for the podlings:
>> 
>> https://incubator.apache.org/guides/distribution.html
>> 
>> -joey
>> 
>>>> On Mar 15, 2021, at 3:26 PM, Jeff Zemerick <jz...@apache.org> wrote:
>>> 
>>> Before starting a second release vote thread for the OpenNLP models and
>>> since this is the first release of pretrained OpenNLP models, I would
>> like
>>> to pause to solicit feedback from the community in regards to the release
>>> configuration.
>>> 
>>> - The files are staged on the ASF dev SVN at
>>> https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/.
>>> - The model files are signed and hashed.
>>> - Includes README, CHANGES, NOTICE, and LICENSE files.
>>> - The training and evaluation outputs are in the training-eval-logs.zip
>>> file (also signed and hashed).
>>> 
>>> Please let me know if anything is missing or should be changed. Once
>> things
>>> are in a good state I will make a PR to document the steps on the website
>>> (OPENNLP-1328) and start a vote thread.
>>> 
>>> ASF Release Creation Process:
>>> https://infra.apache.org/release-publishing.html
>>> 
>>> Thanks,
>>> Jeff
>>

Re: Preparing for UD 1.0 Models Vote

Posted by Jeff Zemerick <jz...@apache.org>.

Joey, thanks for the comments/questions.

- The models can be named along with the 1.9.3 version to show that's what
trained it but we should be careful to not give the impression that the
models *only* work with that version. I think that can be made sufficiently
clear in the documentation.
- My opinion is that it would be best if the models were not tied to the
OpenNLP lifecycle. I would like for the project to be able to release new
models independently of OpenNLP releases. So I hope we can do the latter
and train the models from an official OpenNLP release, vote, and publish.
- I feel that the models fall somewhere in between being a direct binary
artifact and something more derivative like a Docker container because a
model needs to be evaluated prior to being made available. Contrast that
with a docker container which either works or doesn't. More things (how it
was trained, performance, etc.) should be considered when voting on a model
release than just if it works or not.

From that page https://incubator.apache.org/guides/distribution.html:

- Convenience binaries must be made from IPMC approved ASF releases.
- Convenience binaries need to follow licensing policy and not include any
category X licensed software.
- Convenience binaries should be signed and have hashes to verify their
contents.

I think we are ok with those 3 things. I will update the naming of the
models as Joey suggested (to include the OpenNLP version that created them)
and update the README to explain 1.9.3 is the version that created them but
should work with all OpenNLP versions (but only tested with 1.9.3).

Are there any concerns about the model release process given my responses
to your questions?

Thanks,
Jeff




On Mon, Mar 15, 2021 at 7:49 PM Joey Frazee <jo...@icloud.com.invalid>
wrote:

> Jeff, in the other thread you mentioned “I personally have been thinking
> of the models as convenience binaries”.
>
> I think that’s the most obvious answer and is what I’d think too.
>
> If that’s the case, then the policies suggest that the version needs to
> match the version they’re created from. So should this be something like
> opennlp-ud-models-1.0-1.9.3 or similar?
>
> The other thing, which is murky in practice, is that do the models need to
> be voted on concurrently with a release or just created by the PMC from an
> official release and published on Apache supported infrastructure?
>
> Direct binary artifacts are almost always evaluated at the time of a
> release vote but more derivative ones often aren’t. E.g., a lot of projects
> publish Docker images from approved releases but not with an independent
> vote. Which are these?
>
> Incubator recently published some helpful guidelines which clarify related
> stuff for the podlings:
>
> https://incubator.apache.org/guides/distribution.html
>
> -joey
>
> > On Mar 15, 2021, at 3:26 PM, Jeff Zemerick <jz...@apache.org> wrote:
> >
> > Before starting a second release vote thread for the OpenNLP models and
> > since this is the first release of pretrained OpenNLP models, I would
> like
> > to pause to solicit feedback from the community in regards to the release
> > configuration.
> >
> > - The files are staged on the ASF dev SVN at
> > https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/.
> > - The model files are signed and hashed.
> > - Includes README, CHANGES, NOTICE, and LICENSE files.
> > - The training and evaluation outputs are in the training-eval-logs.zip
> > file (also signed and hashed).
> >
> > Please let me know if anything is missing or should be changed. Once
> things
> > are in a good state I will make a PR to document the steps on the website
> > (OPENNLP-1328) and start a vote thread.
> >
> > ASF Release Creation Process:
> > https://infra.apache.org/release-publishing.html
> >
> > Thanks,
> > Jeff
>

Re: Preparing for UD 1.0 Models Vote

Posted by Joey Frazee <jo...@icloud.com.INVALID>.

Jeff, in the other thread you mentioned “I personally have been thinking of the models as convenience binaries”.

I think that’s the most obvious answer and is what I’d think too.

If that’s the case, then the policies suggest that the version needs to match the version they’re created from. So should this be something like opennlp-ud-models-1.0-1.9.3 or similar?

The other thing, which is murky in practice, is that do the models need to be voted on concurrently with a release or just created by the PMC from an official release and published on Apache supported infrastructure?

Direct binary artifacts are almost always evaluated at the time of a release vote but more derivative ones often aren’t. E.g., a lot of projects publish Docker images from approved releases but not with an independent vote. Which are these?

Incubator recently published some helpful guidelines which clarify related stuff for the podlings:

https://incubator.apache.org/guides/distribution.html

-joey

> On Mar 15, 2021, at 3:26 PM, Jeff Zemerick <jz...@apache.org> wrote:
> 
> Before starting a second release vote thread for the OpenNLP models and
> since this is the first release of pretrained OpenNLP models, I would like
> to pause to solicit feedback from the community in regards to the release
> configuration.
> 
> - The files are staged on the ASF dev SVN at
> https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/.
> - The model files are signed and hashed.
> - Includes README, CHANGES, NOTICE, and LICENSE files.
> - The training and evaluation outputs are in the training-eval-logs.zip
> file (also signed and hashed).
> 
> Please let me know if anything is missing or should be changed. Once things
> are in a good state I will make a PR to document the steps on the website
> (OPENNLP-1328) and start a vote thread.
> 
> ASF Release Creation Process:
> https://infra.apache.org/release-publishing.html
> 
> Thanks,
> Jeff