You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Boris Galitsky <bg...@hotmail.com> on 2011/08/18 01:17:04 UTC

current version of source for syntactic match / relevance component


Hello


attached are three packages which is our current version of our proposed contribution of syntactic match / text relevance component for openNLP.
To start looking at it, please go to SyntMatcherTest.java and see the results how commonality between sentences are computed.Then you can go to ParseTreeChunkTest.java and see how the operation of syntactic generalization is applied to particular chunks.
As an application, we selected the problem of content generation when relevance is critical.Please go to "RelatedSentenceFinder" and see which sentences might serve as  seeds for content generation.The system goes on the web and finds somewhat relevant sentences to the seed ones and tries to "write an article".
As examples of auto-generated articles using this technology please seehttp://www.allvoices.com/contributed-news/9423860-best-things-to-do-in-san-francisco-jazz-and-blues-festivalhttp://www.allvoices.com/contributed-news/9415063-britney-spears-femme-fatale-in-north-sf-bay-areahttp://www.allvoices.com/contributed-news/9381803-cirque-du-soleil-quidamThis articles were generated using this class RelatedSentenceFinder.java
Hence the proposed structure of our contribution:
package opennlp.tools.similarity, main and test: implementation of syntactic matchpackage opennlp.tools.similarity.apps: the content generation app leveraging syntactic match for sentence-level similaritypackage opennlp.tools.similarity.apps.utils: utils for the above.
What we needs to be done before full consideration for contribution can be done:1) make it use latest openNLP (now it is using a modified version of 2008's openNLP, although pretty stable, working for 2 years in industrial settings)2) fix all tests, add more tests3) clean the implementation and application code4) add more applications to show more working scenarios of syntactic match5) in addition to academic papers, have better docs for developers
RegardsBoris

 		 	   		  

RE: current version of source for syntactic match / relevance component

Posted by Boris Galitsky <bg...@hotmail.com>.
Thanks  Jörn for prompt response
I created  

https://issues.apache.org/jira/browse/OPENNLP-253
and attached code to it


> 
> We have a sandbox where it could live for a while until it is ready to 
> be released together
> with the current head code. I would suggest to move it there, and then 
> maybe we have
> a good chance to release it with one of the coming 1.5 series releases 
> or 1.6.
> 
> Would that work for you?
Yes it will
RegardsBoris
> 
> I will have a look at the code tomorrow.
> 
> Jörn
 		 	   		  

Re: current version of source for syntactic match / relevance component

Posted by Jörn Kottmann <ko...@gmail.com>.
On 8/18/11 1:17 AM, Boris Galitsky wrote:
>
> Hello
>
>
> attached are three packages which is our current version of our 
> proposed contribution of syntactic match / text relevance component 
> for openNLP.

Did everyone get the attachments? Usually we use jira for this, because 
mail attachments used to be removed
when posted here. Not sure why I got it anyway.

I suggest that you additionally open a jira issue for this contribution, 
and then attach the zip files to it.

Here is the link to it:
https://issues.apache.org/jira/browse/OPENNLP

> To start looking at it, please go to SyntMatcherTest.java and see the 
> results how commonality between sentences are computed.
> Then you can go to ParseTreeChunkTest.java and see how the operation 
> of syntactic generalization is applied to particular chunks.
>
> As an application, we selected the problem of content generation when 
> relevance is critical.
> Please go to "RelatedSentenceFinder" and see which sentences might 
> serve as  seeds for content generation.
> The system goes on the web and finds somewhat relevant sentences to 
> the seed ones and tries to "write an article".
>
> As examples of auto-generated articles using this technology please see
> http://www.allvoices.com/contributed-news/9423860-best-things-to-do-in-san-francisco-jazz-and-blues-festival
>
> http://www.allvoices.com/contributed-news/9415063-britney-spears-femme-fatale-in-north-sf-bay-area
>
> http://www.allvoices.com/contributed-news/9381803-cirque-du-soleil-quidam
> This articles were generated using this class
> RelatedSentenceFinder.java
>
> Hence the proposed structure of our contribution:
>
> package opennlp.tools.similarity, main and test: implementation of 
> syntactic match
> package opennlp.tools.similarity.apps: the content generation app 
> leveraging syntactic match for sentence-level similarity
> package opennlp.tools.similarity.apps.utils: utils for the above.
>
> What we needs to be done before full consideration for contribution 
> can be done:
> 1) make it use latest openNLP (now it is using a modified version of 
> 2008's openNLP, although pretty stable, working for 2 years in 
> industrial settings)
> 2) fix all tests, add more tests
> 3) clean the implementation and application code
> 4) add more applications to show more working scenarios of syntactic match
> 5) in addition to academic papers, have better docs for developers
>

We have a sandbox where it could live for a while until it is ready to 
be released together
with the current head code. I would suggest to move it there, and then 
maybe we have
a good chance to release it with one of the coming 1.5 series releases 
or 1.6.

Would that work for you?

I will have a look at the code tomorrow.

Jörn