You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Olivier Grisel (JIRA)" <ji...@apache.org> on 2011/07/03 22:09:21 UTC

[jira] [Reopened] (STANBOL-201) Integrate pignlproc outpout (TSV or other format) with the Stanbol indexing tools

     [ https://issues.apache.org/jira/browse/STANBOL-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olivier Grisel reopened STANBOL-201:
------------------------------------


I reopen this issue as the generated topic index quality is not good enough for accurate text classification (according to test performed on a direct Solr instance).

Also work is under way on the pignlproc project to improve this by following a hierarchy of "interesting topics" so as to get rid of most of the noisy output. However the ntriples serialization need to be extended to be able to export the materialized category paths (from the root topics) for each index topic so as to make the classifier more efficient. This part is not implement yet.

> Integrate pignlproc outpout (TSV or other format) with the Stanbol indexing tools
> ---------------------------------------------------------------------------------
>
>                 Key: STANBOL-201
>                 URL: https://issues.apache.org/jira/browse/STANBOL-201
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer, Entity Hub
>            Reporter: Olivier Grisel
>            Assignee: Olivier Grisel
>
> Either make pignlproc able to output ntriples or extend the Stanbol indexing tools to be able to index data expressed in a TSV format (e.g. using the solr UpdateCSV handler which is problably well optimized and does not require loading the data into a temporaray TDB store).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira