You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org> on 2011/10/11 17:41:11 UTC

[jira] [Commented] (OPENNLP-253) Add text similarity / relevance / syntactic match component based on parse trees

    [ https://issues.apache.org/jira/browse/OPENNLP-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125127#comment-13125127 ] 

Joern Kottmann commented on OPENNLP-253:
----------------------------------------

Component was added to the sandbox. Please create new issues to provide fixes, currently it does not compile.
                
> Add text similarity / relevance / syntactic match component based on parse trees
> --------------------------------------------------------------------------------
>
>                 Key: OPENNLP-253
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-253
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Parser
>         Environment: jave
>            Reporter: Boris Galitsky
>            Assignee: Joern Kottmann
>         Attachments: text_similarity_proposal_for_opennlp.test.zip, text_similarity_proposal_for_opennlp.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
>  Proposed component relies on openNLP parser, and gives search engineers a simple relevance verification tool which relies on machine learning of syntactic parse trees.
> The value for search engineers community is that they dont have to be familiar with NLP to use syntactic generalization component, which does parsing/chunking by openNLP and then graph-based learning for relevance assessment (proposed component).
> One of the expected usage scenario is that a search library like lucene is used, and this component would accept / reject irrelevant search results (according to the proposed syntactic generalization measure).
> This code has been deployed commercially over last 2 years at datran.com and zvents.com and is serving > 20 mln users monthly.
> There is a number of publications on this project, including 
> http://portal.acm.org/citation.cfm?id=1881190
> http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS11/paper/view/2573

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira