You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Olivier Grisel (Updated) (JIRA)" <ji...@apache.org> on 2012/02/09 14:07:59 UTC

[jira] [Updated] (STANBOL-197) Enhancement Engine for Wikipedia/DBpedia-based topic classification of text content

     [ https://issues.apache.org/jira/browse/STANBOL-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olivier Grisel updated STANBOL-197:
-----------------------------------

    Description: 
Implementation plan:

Use MoreLikeThis queries on a SolrYard instance with topics indexed by aggregating the text of abstracts of all entities marked categorized by a given SKOS topic from DBpedia.

Such an index can be constructed using the pig scripts available at:
https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus
or 
https://github.com/ogrisel/dbpediakit

In order to perform MoreLikeThis queries using the SolrJ API it is possible to do the following:

#1 - Define the mlt handles in solrconfig.xml (it's not defined in the example
solrconfig.xml I was using):

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />

#2 - with Solrj, access the mlt handler via something similar to the following:

query.setQueryType("/" + MoreLikeThisParams.MLT);
query.set(MoreLikeThisParams.MATCH_INCLUDE, false);
query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body");
query.setQuery("Your query here or in my case the unique key field:value");

  was:
Implementation plan:

Use MoreLikeThis queries on a SolrYard instance with topics indexed by aggregating the text of abstracts of all entities marked categorized by a given SKOS topic from DBpedia.

Such an index can be constructed using the pig scripts available at:
https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus

In order to perform MoreLikeThis queries using the SolrJ API it is possible to do the following:

#1 - Define the mlt handles in solrconfig.xml (it's not defined in the example
solrconfig.xml I was using):

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />

#2 - with Solrj, access the mlt handler via something similar to the following:

query.setQueryType("/" + MoreLikeThisParams.MLT);
query.set(MoreLikeThisParams.MATCH_INCLUDE, false);
query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body");
query.setQuery("Your query here or in my case the unique key field:value");

    
> Enhancement Engine for Wikipedia/DBpedia-based topic classification of text content
> -----------------------------------------------------------------------------------
>
>                 Key: STANBOL-197
>                 URL: https://issues.apache.org/jira/browse/STANBOL-197
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer, Entity Hub
>            Reporter: Olivier Grisel
>            Assignee: Olivier Grisel
>              Labels: text-categorization
>
> Implementation plan:
> Use MoreLikeThis queries on a SolrYard instance with topics indexed by aggregating the text of abstracts of all entities marked categorized by a given SKOS topic from DBpedia.
> Such an index can be constructed using the pig scripts available at:
> https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus
> or 
> https://github.com/ogrisel/dbpediakit
> In order to perform MoreLikeThis queries using the SolrJ API it is possible to do the following:
> #1 - Define the mlt handles in solrconfig.xml (it's not defined in the example
> solrconfig.xml I was using):
> <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />
> #2 - with Solrj, access the mlt handler via something similar to the following:
> query.setQueryType("/" + MoreLikeThisParams.MLT);
> query.set(MoreLikeThisParams.MATCH_INCLUDE, false);
> query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
> query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
> query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body");
> query.setQuery("Your query here or in my case the unique key field:value");

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira