You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Jeff Zemerick <jz...@apache.org> on 2021/06/08 14:18:07 UTC

Planning for OpenNLP 2.0

Hi everyone,

OpenNLP became a top-level Apache project around 10 years ago and since
then there have been lots of 1.x releases. In the past few years the NLP
community has seen transformative changes primarily centered around the
Python ecosystem. While OpenNLP still performs and is used by many
projects, I think it would be good for the community to make a plan for
OpenNLP 2.0 and how the project can adapt and continue its goal of being a
capable Java NLP library.

In order for OpenNLP to minimize dependencies, support newer NLP
architectures, and provide backward compatibility, I believe some
refactoring is needed. Moving the opennlp-tools interfaces to an
"opennlp-common" project can provide the decoupling necessary to position
OpenNLP to be able to incorporate deep learning capabilities in 2.x,
provide backwards compatibility, and not introduce any burdensome
dependencies to users of opennlp-tools. I have diagrammed these changes
here: https://cwiki.apache.org/confluence/display/OPENNLP/OpenNLP+2.x

I am proposing that this refactoring take place in a 1.9.x release to set
the stage for 2.0. I will volunteer to take on the refactoring coding
effort.

Please share your thoughts on this proposal and anything else you would
like to see in an OpenNLP 2.0 release. (This proposal doesn't specify any
deep learning capabilities - we'll figure that out later. This is just to
get ready for it.)

Thanks for everyone's support of OpenNLP over the years!

Jeff

Re: Planning for OpenNLP 2.0

Posted by Jeff Zemerick <jz...@apache.org>.
Here is a branch comparison of the proposed refactor changes:
https://github.com/apache/opennlp/compare/master...jzonthemtn:OPENNLP-1331?expand=1

There are two significant changes in that branch:

1. Interfaces such as Chunker.java, POSTagger.java, etc., and common files
such as Span.java were moved to an opennlp-commons project. This will allow
for creating new implementations of these interfaces for some deep learning
technology outside of opennlp-tools. This touched a lot of files due to
updating import statements.

2. There were a lot of source files that were not valid with the project's
checkstyle rules. Most violations were import statement order and line
length. The version of the checkstyle dependency was updated to the newest
version.

There are no functional changes and no new code was written but a full
evaluation test will be run prior to creation of any pull request.

Please take a look at the branch comparison and share any
thoughts/suggestions/concerns.

Thanks,
Jeff


On Tue, Jun 8, 2021 at 10:18 AM Jeff Zemerick <jz...@apache.org> wrote:

> Hi everyone,
>
> OpenNLP became a top-level Apache project around 10 years ago and since
> then there have been lots of 1.x releases. In the past few years the NLP
> community has seen transformative changes primarily centered around the
> Python ecosystem. While OpenNLP still performs and is used by many
> projects, I think it would be good for the community to make a plan for
> OpenNLP 2.0 and how the project can adapt and continue its goal of being a
> capable Java NLP library.
>
> In order for OpenNLP to minimize dependencies, support newer NLP
> architectures, and provide backward compatibility, I believe some
> refactoring is needed. Moving the opennlp-tools interfaces to an
> "opennlp-common" project can provide the decoupling necessary to position
> OpenNLP to be able to incorporate deep learning capabilities in 2.x,
> provide backwards compatibility, and not introduce any burdensome
> dependencies to users of opennlp-tools. I have diagrammed these changes
> here: https://cwiki.apache.org/confluence/display/OPENNLP/OpenNLP+2.x
>
> I am proposing that this refactoring take place in a 1.9.x release to set
> the stage for 2.0. I will volunteer to take on the refactoring coding
> effort.
>
> Please share your thoughts on this proposal and anything else you would
> like to see in an OpenNLP 2.0 release. (This proposal doesn't specify any
> deep learning capabilities - we'll figure that out later. This is just to
> get ready for it.)
>
> Thanks for everyone's support of OpenNLP over the years!
>
> Jeff
>