You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Tobias Ednersson <t_...@hotmail.com> on 2013/09/06 11:53:31 UTC

Ways to contribute?

Hello!

I have a master degree in computational linguistics and basic knowledge of Java.
I am looking for a way to learn java "properly" while contributing to some nlp-project.
I am more of a linguist than a mathematician.
What I would like to do to begin with is simple qa, fixing simple bugs and such.
I have browsed through the apache documentation on how to contribute, but any further pointers would be greatly apreciated.
Where do I start to get to know the project? The most straightforward approach would of course be to check out the source code
and have a look at it. Is this a good strategy?

I hope my help will be welcome

Best regards

Tobias Ednersson
 		 	   		  

Re: Ways to contribute?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 09/06/2013 11:53 AM, Tobias Ednersson wrote:
> I have a master degree in computational linguistics and basic knowledge of Java.
> I am looking for a way to learn java "properly" while contributing to some nlp-project.
> I am more of a linguist than a mathematician.
> What I would like to do to begin with is simple qa, fixing simple bugs and such.
> I have browsed through the apache documentation on how to contribute, but any further pointers would be greatly apreciated.
> Where do I start to get to know the project? The most straightforward approach would of course be to check out the source code
> and have a look at it. Is this a good strategy?

A good way to get started it to train a component on your own and then 
use it to tag some sample text. That should teach you the
very basics about how to use OpenNLP.

Reading through our source code is probably the best way to get a deeper 
understanding on how things work,
there are a few patterns which are repeated over and over again through 
our components. The easiest way to understand
those it through reading the code of some of the simpler components such 
as the Document Categorizer, Tokenizer or Sentence Detector,

Have a look at our issue tracker to find open bugs or features which 
might be of interest for you to work on.

Here is a list of all open issues:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20OPENNLP%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC

It would be very valuable to find a new contributor/committer for our 
machine learning code, maybe that could be something for you.
We have some serious bug in our L-BFGS training code, have a look here 
OPENNLP-338 and the follow up issue to fix the bug OPENNLP-569.
Anyway that might be a difficult to fix issue.

Another interesting issue could be OPENNLP-31, it is about writing 
evaluation code for the parser component. The parser is another area which
is currently lacking maintenance.

HTH,
Jörn

RE: Ways to contribute?

Posted by Tobias Ednersson <t_...@hotmail.com>.
Greetings
OK, so I downloaded the binaries and started looking at the SentenceDetector.
I'm curious to learn more about the theory behind statstical Sentence detection.
Is
 there a paper on this. I understand it's about finding the most 
probable sequence of sentences, however I would like to know more about 
the exakt algorithm behind it. Is there an easily accessable paper which
 describes this?
Best regards
Tobias