You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Tobias Ednersson <t_...@hotmail.com> on 2013/09/06 11:53:31 UTC
Ways to contribute?
Hello!
I have a master degree in computational linguistics and basic knowledge of Java.
I am looking for a way to learn java "properly" while contributing to some nlp-project.
I am more of a linguist than a mathematician.
What I would like to do to begin with is simple qa, fixing simple bugs and such.
I have browsed through the apache documentation on how to contribute, but any further pointers would be greatly apreciated.
Where do I start to get to know the project? The most straightforward approach would of course be to check out the source code
and have a look at it. Is this a good strategy?
I hope my help will be welcome
Best regards
Tobias Ednersson
Re: Ways to contribute?
Posted by Jörn Kottmann <ko...@gmail.com>.
On 09/06/2013 11:53 AM, Tobias Ednersson wrote:
> I have a master degree in computational linguistics and basic knowledge of Java.
> I am looking for a way to learn java "properly" while contributing to some nlp-project.
> I am more of a linguist than a mathematician.
> What I would like to do to begin with is simple qa, fixing simple bugs and such.
> I have browsed through the apache documentation on how to contribute, but any further pointers would be greatly apreciated.
> Where do I start to get to know the project? The most straightforward approach would of course be to check out the source code
> and have a look at it. Is this a good strategy?
A good way to get started it to train a component on your own and then
use it to tag some sample text. That should teach you the
very basics about how to use OpenNLP.
Reading through our source code is probably the best way to get a deeper
understanding on how things work,
there are a few patterns which are repeated over and over again through
our components. The easiest way to understand
those it through reading the code of some of the simpler components such
as the Document Categorizer, Tokenizer or Sentence Detector,
Have a look at our issue tracker to find open bugs or features which
might be of interest for you to work on.
Here is a list of all open issues:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20OPENNLP%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
It would be very valuable to find a new contributor/committer for our
machine learning code, maybe that could be something for you.
We have some serious bug in our L-BFGS training code, have a look here
OPENNLP-338 and the follow up issue to fix the bug OPENNLP-569.
Anyway that might be a difficult to fix issue.
Another interesting issue could be OPENNLP-31, it is about writing
evaluation code for the parser component. The parser is another area which
is currently lacking maintenance.
HTH,
Jörn
RE: Ways to contribute?
Posted by Tobias Ednersson <t_...@hotmail.com>.
Greetings
OK, so I downloaded the binaries and started looking at the SentenceDetector.
I'm curious to learn more about the theory behind statstical Sentence detection.
Is
there a paper on this. I understand it's about finding the most
probable sequence of sentences, however I would like to know more about
the exakt algorithm behind it. Is there an easily accessable paper which
describes this?
Best regards
Tobias