You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Boris Galitsky <bg...@hotmail.com> on 2020/09/29 17:56:00 UTC

session on using OpenNLP / discourse analysis for docs indexing

Hello

 The presentation is going to start in 15 mins

https://www.apachecon.com/acah2020/tracks/search.html
ApacheCon @Home - Lucene/Solr/Search Track<https://www.apachecon.com/acah2020/tracks/search.html>
Tuesday 16:15 UTC Tales From The Trenches: Solr Operations Mike Drob There are many pitfalls that a team can fall into when designing and implementing a new Solr-based search application.
www.apachecon.com
An Anatomy of an Answer: Open NLP & Discourse Analysis - based Indexing


Indexers usually index all text in documents. However, once we learn to "understand" the logic of a plain text, we will see how bad for a search it is to index the whole thing. Discourse analysis helps to select text fragment which should be matched with a potential query, and throw away the rest In this talk we will apply discourse linguistic to practical text search and discover that the majority of indexers which index all text perform very poorly for complex queries. Relying on standard relevance means such as TF*IDF does not alleviate this problem. We will explore how discourse analysis helps search by identifying text fragments which should be indexed and matched with potential queries, and those text fragments which would mislead the search and make its precision low. We will demonstrate how a discourse analysis - based indexer can be employed relying on Apache Open NLP project. The audience will learn how discourse analysis formalizes a logic of text to be searched and represents it as a discourse tree, a structure to represent a domain-independent logical organization of text essential for finding relevant fragments. We will also discuss how to proceed from search engines like SOLR to chatbots, where discourse analysis helps with dialogue management.


Regards
Boris