You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Suman Saurabh <ss...@gmail.com> on 2014/05/28 00:05:11 UTC

Help regarding usage of DataFileProvider

Hi Rupert, All

I am building Speech To Text Engine ( [1] for those who need introduction).
Engine requires DataFileProvider infrastructure for handling configuration
file of acoustic and language modal. Basically what happens is client
provides the *Acoustic Modal* *folder *, *Dictionary file* and *Language
modal file* in jar file in following format.
eg.
sphinx4-data-1.0-SNAPSHOT.jar default modal file, it contains
/edu/cmu/sphinx/models/language/en-us.lm.dmp  *File* for language modal
/edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d *File *for dictionary
/edu/cmu/sphinx/models/acoustic/wsj/ *Folder* for acoustic modal

This jar can be added to project using the following dependency:
<dependency>
        <groupId>edu.cmu.sphinx</groupId>
        <artifactId>sphinx4-data</artifactId>
        <version>1.0-SNAPSHOT</version>
</dependency>

but when clients wants to use his own modal file, Stanbol
hasDataFileProvider infrastructure for handling such big binary
configuration
files.

I went through the documentation of DataFileProvider [2] and some of the
enhancement engine like Sentiment Word Classifier - source code that uses
DataFileProvider service, to see the implementation of DataFileProvider ,
but I am not yet clear how to use it.

Maybe you can provide some *insights* or *links* that provides better
description of it. It will save lot of time.

Regards,
Suman Saurabh

[1] https://sites.google.com/site/gsoc2014stanbol/home/abstract
[2] http://stanbol.apache.org/docs/trunk/utils/datafileprovider

Re: Help regarding usage of DataFileProvider

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Suman,

[2] describes it very well.

As far as I understand Sphinx4 uses JAR files as container for parsing
multiple required configuration files. If this is correct, you should
not try to add those files to the classpath (e.g by adding a new
dependency) but rather allow users to just copy those files to the
`stanbol/datafile` folder.

The DatafileProvider allows you to lookup resources by their name.
There is also a DatafileTracker that can be used to track files. The
tracker will provide you a callback as soon as an resource becomes
available.

So all you need to do is to implement a service that allows to request
a "Acoustic Modal", "Dictionary file" or "Language modal file" by its
name and does provide the loaded models as Java Objects. The names
need to be provided by the requesting component (the Enhancement
Engine). You should define default naming templates (convention over
configuration).

The OpenNLP service [3] does exactly this for all the OpenNLP engines.
So I guess this is the best place to look at

best
Rupert

> [1] https://sites.google.com/site/gsoc2014stanbol/home/abstract
> [2] http://stanbol.apache.org/docs/trunk/utils/datafileprovider
[3] http://svn.apache.org/repos/asf/stanbol/trunk/commons/opennlp/src/main/java/org/apache/stanbol/commons/opennlp/OpenNLP.java

On Wed, May 28, 2014 at 12:05 AM, Suman Saurabh
<ss...@gmail.com> wrote:
> Hi Rupert, All
>
> I am building Speech To Text Engine ( [1] for those who need introduction).
> Engine requires DataFileProvider infrastructure for handling configuration
> file of acoustic and language modal. Basically what happens is client
> provides the *Acoustic Modal* *folder *, *Dictionary file* and *Language
> modal file* in jar file in following format.
> eg.
> sphinx4-data-1.0-SNAPSHOT.jar default modal file, it contains
> /edu/cmu/sphinx/models/language/en-us.lm.dmp  *File* for language modal
> /edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d *File *for dictionary
> /edu/cmu/sphinx/models/acoustic/wsj/ *Folder* for acoustic modal
>
> This jar can be added to project using the following dependency:
> <dependency>
>         <groupId>edu.cmu.sphinx</groupId>
>         <artifactId>sphinx4-data</artifactId>
>         <version>1.0-SNAPSHOT</version>
> </dependency>
>
> but when clients wants to use his own modal file, Stanbol
> hasDataFileProvider infrastructure for handling such big binary
> configuration
> files.
>
> I went through the documentation of DataFileProvider [2] and some of the
> enhancement engine like Sentiment Word Classifier - source code that uses
> DataFileProvider service, to see the implementation of DataFileProvider ,
> but I am not yet clear how to use it.
>
> Maybe you can provide some *insights* or *links* that provides better
> description of it. It will save lot of time.
>
> Regards,
> Suman Saurabh
>
> [1] https://sites.google.com/site/gsoc2014stanbol/home/abstract
> [2] http://stanbol.apache.org/docs/trunk/utils/datafileprovider



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/