You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by seralf <se...@gmail.com> on 2012/03/22 12:24:00 UTC

stambol with italian language

Hi i'm new to stambol, i'm reading the documentation and examples, and i'd
like to start some testing with it on italian language, if it's possible.

Could someone give me some hint regarding the steps to try to costruct my
model (Italian) and configure it inside the platform? I suppose it's
possible and it should be not very far to the steps taken for construct
-let's say- the Spanish integration.
What i need to do? I know it could sound a very generic question, but it's
not so clear from the documentation, so i need help.
For my test i would like to be able to use a text corpora from the database
of a client, and a skos thesaurus from the same domain.

thanks in advance for every help (suggestions, code examples, ideas, etc)

cheers,
Alfredo Serafini

Re: stambol with italian language

Posted by seralf <se...@gmail.com>.
ok, thanks very much

2012/3/28 Rupert Westenthaler <ru...@gmail.com>

> Hi,
>
> and sorry for the late response ...
>
> >> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
> >>
> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
> >> start -c ../sling *doesn't work for me*
> >>
> >> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
> >>
> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
> >> start *works*
> >>
>
> the reason for that is that in a lot of cases the default sling folder
> "/sling" is hardcoded in Stanbol.
> One such example is the MainDataFileProvider that serves the files
> located in /sling/datafiles.
>
> I consider this behavior as a bug and created
> https://issues.apache.org/jira/browse/STANBOL-561
>
> While doing that I will also change the default from
> "{wokring-dir}/sling" to "{working-dir}/stanbol"  (see
> https://issues.apache.org/jira/browse/STANBOL-562)
>
> In the meantime you will need to use the default
>
> best
> Rupert
>
> >
> >
> > thanks,
> > Alfredo
> >
> > 2012/3/28 ajs6f@virginia.edu <aj...@virginia.edu>
> >
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> I've seen this happen loading custom vocabularies built by the Generic
> RDF
> >> Indexer and I'm honestly still not sure of why. In my case, restarting
> the
> >> custom bundle and the Solr Yard bundle seemed to make it work. I imagine
> >> that restarting Stanbol would do the same. Perhaps there is some subtle
> >> error in the building of the custom bundle that makes it possible for a
> >> Solr index service to be created but not started?
> >>
> >> As to managing configuration, you may want to follow:
> >>
> >> https://issues.apache.org/jira/browse/STANBOL-529
> >>
> >> which offers a future way to provide configuration at startup. I'm not
> >> familiar enough with the Sling Launcher system to know how difficult it
> >> would be to directly expose deployment via REST, but it might be more
> >> feasible using the Apache Felix Web Console which is normally included
> in
> >> Stanbol builds:
> >>
> >> http://felix.apache.org/site/web-console-restful-api.html
> >>
> >>
> http://felix.apache.org/site/apache-felix-web-console.html#ApacheFelixWebConsole-RESTfulAPI
> >>
> >> - ---
> >> A. Soroka
> >> Software & Systems Engineering :: Online Library Environment
> >> the University of Virginia Library
> >>
> >> On Mar 28, 2012, at 11:24 AM, seralf wrote:
> >>
> >> > yes i have already started the bundle, but if i search from the web
> >> > interface or via a command line like:
> >> > curl -X POST -d "name=roma*&limit=10&offset=0"
> >> > http://localhost:8080/entityhub/site/<SITE-NAME>/find
> >> >
> >> > i have the error i pasted.
> >> >
> >> > Any suggestions? maybe i miss some configuration step?
> >> >
> >> > 2012/3/28 Michel Benevento <mb...@kanker.nl>
> >> >
> >> >> Have you started your installed bundle in the admin console? Click
> the
> >> >> little triangle next to it so it becomes a square and the status
> message
> >> >> updates.
> >> >>
> >> >> Michel
> >> >>
> >> >>
> >> >> On 28 mrt. 2012, at 17:02, seralf wrote:
> >> >>
> >> >>> Hi i'm trying to use the KeywordLinking as Rupert suggested me
> earlier.
> >> >>> I've done the solr indexes as in the tutorial and they seems to be
> ok
> >> (i
> >> >>> looked inside them with Luke), i've copied them in
> ROOT/sling/dataset,
> >> >> and
> >> >>> then installed the generated bundle via the console.
> >> >>>
> >> >>> Now i have a strange error: seems like stanbol is not actually load
> my
> >> >>> indexes, or for some reason it has not activated the yard
> >> >>>
> >> >>> java.lang.IllegalStateException: Unable to initialize the Cache with
> >> Yard
> >> >>>> <SITE-NAME> Index! This is usually caused by Errors while reading
> the
> >> >> Cache
> >> >>>> Configuration from the Yard.
> >> >>>>   at
> >> >>>>
> >> >>
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >> >>>>   at
> >> >>>>
> >> >>
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
> >> >>>>   ...
> >> >>>> Caused by:
> >> org.apache.stanbol.entityhub.servicesapi.yard.YardException:
> >> >>>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
> >> >> currently
> >> >>>> not active!
> >> >>>>   ...
> >> >>>>
> >> >>>
> >> >>> does anyone has suggestion on this?
> >> >>>
> >> >>> i have two other related questions:
> >> >>> 1) how can i start stanbol with specific config activated?
> >> >>> 2) is there any way to manage the deploy/activation via some kind of
> >> rest
> >> >>> interface? (for example curl? it could be helpful for doing some
> >> >>> automatization... )
> >> >>>
> >> >>> thanks in advance,
> >> >>> Alfredo
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> 2012/3/22 seralf <se...@gmail.com>
> >> >>>
> >> >>>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
> >> >>>>
> >> >>>> i think i'll try to follow your suggestion, and try to use my
> >> thesaurus
> >> >>>> with the workflow option 2)
> >> >>>> i already use solr either, so it's probably the best choice for my
> >> >> needs,
> >> >>>> indeed
> >> >>>>
> >> >>>> on the other hand i'm still interested on give a try on opennlp
> >> italian
> >> >>>> model construction, but i can to my experiments externally, as i
> >> correct
> >> >>>> understand.
> >> >>>>
> >> >>>> thanks very much, i'll try to make some progress
> >> >>>> Alfredo
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> 2012/3/22 Rupert Westenthaler <ru...@gmail.com>
> >> >>>>
> >> >>>>> Hi Alfredo
> >> >>>>>
> >> >>>>> On 22.03.2012, at 12:24, seralf wrote:
> >> >>>>>
> >> >>>>>> Hi i'm new to stambol, i'm reading the documentation and
> examples,
> >> and
> >> >>>>> i'd
> >> >>>>>> like to start some testing with it on italian language, if it's
> >> >>>>> possible.
> >> >>>>>>
> >> >>>>>> Could someone give me some hint regarding the steps to try to
> >> costruct
> >> >>>>> my
> >> >>>>>> model (Italian) and configure it inside the platform? I suppose
> it's
> >> >>>>>> possible and it should be not very far to the steps taken for
> >> >> construct
> >> >>>>>> -let's say- the Spanish integration.
> >> >>>>>> What i need to do? I know it could sound a very generic question,
> >> but
> >> >>>>> it's
> >> >>>>>> not so clear from the documentation, so i need help.
> >> >>>>>> For my test i would like to be able to use a text corpora from
> the
> >> >>>>> database
> >> >>>>>> of a client, and a skos thesaurus from the same domain.
> >> >>>>>>
> >> >>>>>> thanks in advance for every help (suggestions, code examples,
> ideas,
> >> >>>>> etc)
> >> >>>>>>
> >> >>>>>
> >> >>>>> In principle there are two different workflows how to extract
> >> Entities
> >> >>>>> form Text
> >> >>>>>
> >> >>>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
> >> >>>>> (2) KeywordLinking [5]
> >> >>>>>
> >> >>>>>
> >> >>>>> (1) requires a OpenNLP [1] NER model for the language of your
> >> >> documents.
> >> >>>>> However currently there are no models for the italian language
> >> >> distributed
> >> >>>>> by OpenNLP. This would require you to build your own models. For
> more
> >> >>>>> information on how to do that please see the documentation of
> OpenNLP
> >> >> [1].
> >> >>>>> As soon as you have such models you need only copy them into the
> >> >>>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the
> >> naming
> >> >>>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
> >> >> "it-ner.location.bin"
> >> >>>>> for the model that detects locations for italian) Stanbol will
> pick
> >> >> them up
> >> >>>>> automatically.
> >> >>>>>
> >> >>>>> (2) directly matches words of the text with labels of entities
> within
> >> >> the
> >> >>>>> controlled vocabulary. This process can be improved by Natural
> >> Langauge
> >> >>>>> Processing (e.g. Part-of-Speech tagging) but this is not a
> >> requirement.
> >> >>>>> Typically this works fine for datasets that contain named entities
> >> >> such as
> >> >>>>> concepts of an thesaurus; contacts of an company, projects,
> products
> >> …
> >> >> It
> >> >>>>> does not work well with datasets that contains entities with
> labels
> >> >> that
> >> >>>>> are also used as common words in the given language as this will
> >> >> result in
> >> >>>>> a lot of false positives.
> >> >>>>>
> >> >>>>> Based on the information you provided on you use case I suggest
> that
> >> >> (2)
> >> >>>>> should work just fine for you. This user scenario [2] should
> provide
> >> >> you
> >> >>>>> will all the needed information on how to configure Stanbol for
> your
> >> >> use
> >> >>>>> case.
> >> >>>>>
> >> >>>>> I hope this helps. If you have any further questions feel free to
> ask
> >> >>>>>
> >> >>>>> best
> >> >>>>> Rupert Westenthaler
> >> >>>>>
> >> >>>>> [1] http://opennlp.apache.org/
> >> >>>>> [2]
> >> >> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
> >> >>>>>
> >> >>>>> [3]
> >> >>>>>
> >> >>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
> >> >>>>> [4]
> >> >>>>>
> >> >>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
> >> >>>>> [5]
> >> >>>>>
> >> >>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
> >> >>>>>
> >> >>>>>> cheers,
> >> >>>>>> Alfredo Serafini
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>
> >> >>
> >>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> >> Comment: GPGTools - http://gpgtools.org
> >>
> >> iQEcBAEBAgAGBQJPcy+pAAoJEATpPYSyaoIkouwH/imt4ERphKHGc6tXrkLQIFWJ
> >> TclWGjyCjoT1GgOr2OGjwfTS9xmcbsn3mYwfv+tuxNj2FfXfi4OfoVza6z7tZeUZ
> >> WdH4+cmq+4Lg+7lt+Pbt2narYWhvUCg2Dths8tdj8nPtJSEEd2KfW5DQqnwq/CfA
> >> uqOAN5zEb9rsy5gTGzSNxX66fpnM1t7XWHs2gmoD17rfmnJEQBc3l+a6rnLJdnFX
> >> vABg2gEiYt5YGaZRG4V1oVC5SqEoZlysix/tkZyWcFMvXN+nvePbMDhaqBwjWc5k
> >> 719uf4gW66Xf7V8zeWgwQcXlNICAebyXnsGiPqkeUaZa4nhm6v+G+FT4Ho/R4lk=
> >> =10yV
> >> -----END PGP SIGNATURE-----
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: stambol with italian language

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi,

and sorry for the late response ...

>> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
>> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
>> start -c ../sling *doesn't work for me*
>>
>> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
>> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
>> start *works*
>>

the reason for that is that in a lot of cases the default sling folder
"/sling" is hardcoded in Stanbol.
One such example is the MainDataFileProvider that serves the files
located in /sling/datafiles.

I consider this behavior as a bug and created
https://issues.apache.org/jira/browse/STANBOL-561

While doing that I will also change the default from
"{wokring-dir}/sling" to "{working-dir}/stanbol"  (see
https://issues.apache.org/jira/browse/STANBOL-562)

In the meantime you will need to use the default

best
Rupert

>
>
> thanks,
> Alfredo
>
> 2012/3/28 ajs6f@virginia.edu <aj...@virginia.edu>
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I've seen this happen loading custom vocabularies built by the Generic RDF
>> Indexer and I'm honestly still not sure of why. In my case, restarting the
>> custom bundle and the Solr Yard bundle seemed to make it work. I imagine
>> that restarting Stanbol would do the same. Perhaps there is some subtle
>> error in the building of the custom bundle that makes it possible for a
>> Solr index service to be created but not started?
>>
>> As to managing configuration, you may want to follow:
>>
>> https://issues.apache.org/jira/browse/STANBOL-529
>>
>> which offers a future way to provide configuration at startup. I'm not
>> familiar enough with the Sling Launcher system to know how difficult it
>> would be to directly expose deployment via REST, but it might be more
>> feasible using the Apache Felix Web Console which is normally included in
>> Stanbol builds:
>>
>> http://felix.apache.org/site/web-console-restful-api.html
>>
>> http://felix.apache.org/site/apache-felix-web-console.html#ApacheFelixWebConsole-RESTfulAPI
>>
>> - ---
>> A. Soroka
>> Software & Systems Engineering :: Online Library Environment
>> the University of Virginia Library
>>
>> On Mar 28, 2012, at 11:24 AM, seralf wrote:
>>
>> > yes i have already started the bundle, but if i search from the web
>> > interface or via a command line like:
>> > curl -X POST -d "name=roma*&limit=10&offset=0"
>> > http://localhost:8080/entityhub/site/<SITE-NAME>/find
>> >
>> > i have the error i pasted.
>> >
>> > Any suggestions? maybe i miss some configuration step?
>> >
>> > 2012/3/28 Michel Benevento <mb...@kanker.nl>
>> >
>> >> Have you started your installed bundle in the admin console? Click the
>> >> little triangle next to it so it becomes a square and the status message
>> >> updates.
>> >>
>> >> Michel
>> >>
>> >>
>> >> On 28 mrt. 2012, at 17:02, seralf wrote:
>> >>
>> >>> Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
>> >>> I've done the solr indexes as in the tutorial and they seems to be ok
>> (i
>> >>> looked inside them with Luke), i've copied them in ROOT/sling/dataset,
>> >> and
>> >>> then installed the generated bundle via the console.
>> >>>
>> >>> Now i have a strange error: seems like stanbol is not actually load my
>> >>> indexes, or for some reason it has not activated the yard
>> >>>
>> >>> java.lang.IllegalStateException: Unable to initialize the Cache with
>> Yard
>> >>>> <SITE-NAME> Index! This is usually caused by Errors while reading the
>> >> Cache
>> >>>> Configuration from the Yard.
>> >>>>   at
>> >>>>
>> >>
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>> >>>>   at
>> >>>>
>> >>
>> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
>> >>>>   ...
>> >>>> Caused by:
>> org.apache.stanbol.entityhub.servicesapi.yard.YardException:
>> >>>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
>> >> currently
>> >>>> not active!
>> >>>>   ...
>> >>>>
>> >>>
>> >>> does anyone has suggestion on this?
>> >>>
>> >>> i have two other related questions:
>> >>> 1) how can i start stanbol with specific config activated?
>> >>> 2) is there any way to manage the deploy/activation via some kind of
>> rest
>> >>> interface? (for example curl? it could be helpful for doing some
>> >>> automatization... )
>> >>>
>> >>> thanks in advance,
>> >>> Alfredo
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> 2012/3/22 seralf <se...@gmail.com>
>> >>>
>> >>>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
>> >>>>
>> >>>> i think i'll try to follow your suggestion, and try to use my
>> thesaurus
>> >>>> with the workflow option 2)
>> >>>> i already use solr either, so it's probably the best choice for my
>> >> needs,
>> >>>> indeed
>> >>>>
>> >>>> on the other hand i'm still interested on give a try on opennlp
>> italian
>> >>>> model construction, but i can to my experiments externally, as i
>> correct
>> >>>> understand.
>> >>>>
>> >>>> thanks very much, i'll try to make some progress
>> >>>> Alfredo
>> >>>>
>> >>>>
>> >>>>
>> >>>> 2012/3/22 Rupert Westenthaler <ru...@gmail.com>
>> >>>>
>> >>>>> Hi Alfredo
>> >>>>>
>> >>>>> On 22.03.2012, at 12:24, seralf wrote:
>> >>>>>
>> >>>>>> Hi i'm new to stambol, i'm reading the documentation and examples,
>> and
>> >>>>> i'd
>> >>>>>> like to start some testing with it on italian language, if it's
>> >>>>> possible.
>> >>>>>>
>> >>>>>> Could someone give me some hint regarding the steps to try to
>> costruct
>> >>>>> my
>> >>>>>> model (Italian) and configure it inside the platform? I suppose it's
>> >>>>>> possible and it should be not very far to the steps taken for
>> >> construct
>> >>>>>> -let's say- the Spanish integration.
>> >>>>>> What i need to do? I know it could sound a very generic question,
>> but
>> >>>>> it's
>> >>>>>> not so clear from the documentation, so i need help.
>> >>>>>> For my test i would like to be able to use a text corpora from the
>> >>>>> database
>> >>>>>> of a client, and a skos thesaurus from the same domain.
>> >>>>>>
>> >>>>>> thanks in advance for every help (suggestions, code examples, ideas,
>> >>>>> etc)
>> >>>>>>
>> >>>>>
>> >>>>> In principle there are two different workflows how to extract
>> Entities
>> >>>>> form Text
>> >>>>>
>> >>>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
>> >>>>> (2) KeywordLinking [5]
>> >>>>>
>> >>>>>
>> >>>>> (1) requires a OpenNLP [1] NER model for the language of your
>> >> documents.
>> >>>>> However currently there are no models for the italian language
>> >> distributed
>> >>>>> by OpenNLP. This would require you to build your own models. For more
>> >>>>> information on how to do that please see the documentation of OpenNLP
>> >> [1].
>> >>>>> As soon as you have such models you need only copy them into the
>> >>>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the
>> naming
>> >>>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
>> >> "it-ner.location.bin"
>> >>>>> for the model that detects locations for italian) Stanbol will pick
>> >> them up
>> >>>>> automatically.
>> >>>>>
>> >>>>> (2) directly matches words of the text with labels of entities within
>> >> the
>> >>>>> controlled vocabulary. This process can be improved by Natural
>> Langauge
>> >>>>> Processing (e.g. Part-of-Speech tagging) but this is not a
>> requirement.
>> >>>>> Typically this works fine for datasets that contain named entities
>> >> such as
>> >>>>> concepts of an thesaurus; contacts of an company, projects, products
>> …
>> >> It
>> >>>>> does not work well with datasets that contains entities with labels
>> >> that
>> >>>>> are also used as common words in the given language as this will
>> >> result in
>> >>>>> a lot of false positives.
>> >>>>>
>> >>>>> Based on the information you provided on you use case I suggest that
>> >> (2)
>> >>>>> should work just fine for you. This user scenario [2] should provide
>> >> you
>> >>>>> will all the needed information on how to configure Stanbol for your
>> >> use
>> >>>>> case.
>> >>>>>
>> >>>>> I hope this helps. If you have any further questions feel free to ask
>> >>>>>
>> >>>>> best
>> >>>>> Rupert Westenthaler
>> >>>>>
>> >>>>> [1] http://opennlp.apache.org/
>> >>>>> [2]
>> >> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>> >>>>>
>> >>>>> [3]
>> >>>>>
>> >>
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
>> >>>>> [4]
>> >>>>>
>> >>
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
>> >>>>> [5]
>> >>>>>
>> >>
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>> >>>>>
>> >>>>>> cheers,
>> >>>>>> Alfredo Serafini
>> >>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
>> Comment: GPGTools - http://gpgtools.org
>>
>> iQEcBAEBAgAGBQJPcy+pAAoJEATpPYSyaoIkouwH/imt4ERphKHGc6tXrkLQIFWJ
>> TclWGjyCjoT1GgOr2OGjwfTS9xmcbsn3mYwfv+tuxNj2FfXfi4OfoVza6z7tZeUZ
>> WdH4+cmq+4Lg+7lt+Pbt2narYWhvUCg2Dths8tdj8nPtJSEEd2KfW5DQqnwq/CfA
>> uqOAN5zEb9rsy5gTGzSNxX66fpnM1t7XWHs2gmoD17rfmnJEQBc3l+a6rnLJdnFX
>> vABg2gEiYt5YGaZRG4V1oVC5SqEoZlysix/tkZyWcFMvXN+nvePbMDhaqBwjWc5k
>> 719uf4gW66Xf7V8zeWgwQcXlNICAebyXnsGiPqkeUaZa4nhm6v+G+FT4Ho/R4lk=
>> =10yV
>> -----END PGP SIGNATURE-----
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: stambol with italian language

Posted by seralf <se...@gmail.com>.
first of all thanks for the suggestions on the felix interaction, i'll
check/study this later as soon as possible

for my 'not active' index issue:
ok the problems seems to be realated to the 'sling' directory i was using:
i have started the 'stanbol-full' version using the option -c ../sling (so
putting the index in <root>/sling/datafiles) and for some reason it doesn't
work

if i start again with the same configs but using the directory
<root>/launcher/sling/datasets without the -c option it all works...

At the moment for my test it's ok, but does anybody have an idea of what
i'm missing for having the system started with a custom data directory? :-)

just to be clear:

> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
> start -c ../sling *doesn't work for me*
>
> ~/sw/stanbol/launchers$ java -Xmx1024m -jar
> full/target/org.apache.stanbol.launchers.full-0.9.0-incubating-SNAPSHOT.jar
> start *works*
>


thanks,
Alfredo

2012/3/28 ajs6f@virginia.edu <aj...@virginia.edu>

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I've seen this happen loading custom vocabularies built by the Generic RDF
> Indexer and I'm honestly still not sure of why. In my case, restarting the
> custom bundle and the Solr Yard bundle seemed to make it work. I imagine
> that restarting Stanbol would do the same. Perhaps there is some subtle
> error in the building of the custom bundle that makes it possible for a
> Solr index service to be created but not started?
>
> As to managing configuration, you may want to follow:
>
> https://issues.apache.org/jira/browse/STANBOL-529
>
> which offers a future way to provide configuration at startup. I'm not
> familiar enough with the Sling Launcher system to know how difficult it
> would be to directly expose deployment via REST, but it might be more
> feasible using the Apache Felix Web Console which is normally included in
> Stanbol builds:
>
> http://felix.apache.org/site/web-console-restful-api.html
>
> http://felix.apache.org/site/apache-felix-web-console.html#ApacheFelixWebConsole-RESTfulAPI
>
> - ---
> A. Soroka
> Software & Systems Engineering :: Online Library Environment
> the University of Virginia Library
>
> On Mar 28, 2012, at 11:24 AM, seralf wrote:
>
> > yes i have already started the bundle, but if i search from the web
> > interface or via a command line like:
> > curl -X POST -d "name=roma*&limit=10&offset=0"
> > http://localhost:8080/entityhub/site/<SITE-NAME>/find
> >
> > i have the error i pasted.
> >
> > Any suggestions? maybe i miss some configuration step?
> >
> > 2012/3/28 Michel Benevento <mb...@kanker.nl>
> >
> >> Have you started your installed bundle in the admin console? Click the
> >> little triangle next to it so it becomes a square and the status message
> >> updates.
> >>
> >> Michel
> >>
> >>
> >> On 28 mrt. 2012, at 17:02, seralf wrote:
> >>
> >>> Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
> >>> I've done the solr indexes as in the tutorial and they seems to be ok
> (i
> >>> looked inside them with Luke), i've copied them in ROOT/sling/dataset,
> >> and
> >>> then installed the generated bundle via the console.
> >>>
> >>> Now i have a strange error: seems like stanbol is not actually load my
> >>> indexes, or for some reason it has not activated the yard
> >>>
> >>> java.lang.IllegalStateException: Unable to initialize the Cache with
> Yard
> >>>> <SITE-NAME> Index! This is usually caused by Errors while reading the
> >> Cache
> >>>> Configuration from the Yard.
> >>>>   at
> >>>>
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >>>>   at
> >>>>
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
> >>>>   ...
> >>>> Caused by:
> org.apache.stanbol.entityhub.servicesapi.yard.YardException:
> >>>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
> >> currently
> >>>> not active!
> >>>>   ...
> >>>>
> >>>
> >>> does anyone has suggestion on this?
> >>>
> >>> i have two other related questions:
> >>> 1) how can i start stanbol with specific config activated?
> >>> 2) is there any way to manage the deploy/activation via some kind of
> rest
> >>> interface? (for example curl? it could be helpful for doing some
> >>> automatization... )
> >>>
> >>> thanks in advance,
> >>> Alfredo
> >>>
> >>>
> >>>
> >>>
> >>> 2012/3/22 seralf <se...@gmail.com>
> >>>
> >>>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
> >>>>
> >>>> i think i'll try to follow your suggestion, and try to use my
> thesaurus
> >>>> with the workflow option 2)
> >>>> i already use solr either, so it's probably the best choice for my
> >> needs,
> >>>> indeed
> >>>>
> >>>> on the other hand i'm still interested on give a try on opennlp
> italian
> >>>> model construction, but i can to my experiments externally, as i
> correct
> >>>> understand.
> >>>>
> >>>> thanks very much, i'll try to make some progress
> >>>> Alfredo
> >>>>
> >>>>
> >>>>
> >>>> 2012/3/22 Rupert Westenthaler <ru...@gmail.com>
> >>>>
> >>>>> Hi Alfredo
> >>>>>
> >>>>> On 22.03.2012, at 12:24, seralf wrote:
> >>>>>
> >>>>>> Hi i'm new to stambol, i'm reading the documentation and examples,
> and
> >>>>> i'd
> >>>>>> like to start some testing with it on italian language, if it's
> >>>>> possible.
> >>>>>>
> >>>>>> Could someone give me some hint regarding the steps to try to
> costruct
> >>>>> my
> >>>>>> model (Italian) and configure it inside the platform? I suppose it's
> >>>>>> possible and it should be not very far to the steps taken for
> >> construct
> >>>>>> -let's say- the Spanish integration.
> >>>>>> What i need to do? I know it could sound a very generic question,
> but
> >>>>> it's
> >>>>>> not so clear from the documentation, so i need help.
> >>>>>> For my test i would like to be able to use a text corpora from the
> >>>>> database
> >>>>>> of a client, and a skos thesaurus from the same domain.
> >>>>>>
> >>>>>> thanks in advance for every help (suggestions, code examples, ideas,
> >>>>> etc)
> >>>>>>
> >>>>>
> >>>>> In principle there are two different workflows how to extract
> Entities
> >>>>> form Text
> >>>>>
> >>>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
> >>>>> (2) KeywordLinking [5]
> >>>>>
> >>>>>
> >>>>> (1) requires a OpenNLP [1] NER model for the language of your
> >> documents.
> >>>>> However currently there are no models for the italian language
> >> distributed
> >>>>> by OpenNLP. This would require you to build your own models. For more
> >>>>> information on how to do that please see the documentation of OpenNLP
> >> [1].
> >>>>> As soon as you have such models you need only copy them into the
> >>>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the
> naming
> >>>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
> >> "it-ner.location.bin"
> >>>>> for the model that detects locations for italian) Stanbol will pick
> >> them up
> >>>>> automatically.
> >>>>>
> >>>>> (2) directly matches words of the text with labels of entities within
> >> the
> >>>>> controlled vocabulary. This process can be improved by Natural
> Langauge
> >>>>> Processing (e.g. Part-of-Speech tagging) but this is not a
> requirement.
> >>>>> Typically this works fine for datasets that contain named entities
> >> such as
> >>>>> concepts of an thesaurus; contacts of an company, projects, products
> …
> >> It
> >>>>> does not work well with datasets that contains entities with labels
> >> that
> >>>>> are also used as common words in the given language as this will
> >> result in
> >>>>> a lot of false positives.
> >>>>>
> >>>>> Based on the information you provided on you use case I suggest that
> >> (2)
> >>>>> should work just fine for you. This user scenario [2] should provide
> >> you
> >>>>> will all the needed information on how to configure Stanbol for your
> >> use
> >>>>> case.
> >>>>>
> >>>>> I hope this helps. If you have any further questions feel free to ask
> >>>>>
> >>>>> best
> >>>>> Rupert Westenthaler
> >>>>>
> >>>>> [1] http://opennlp.apache.org/
> >>>>> [2]
> >> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
> >>>>>
> >>>>> [3]
> >>>>>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
> >>>>> [4]
> >>>>>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
> >>>>> [5]
> >>>>>
> >>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
> >>>>>
> >>>>>> cheers,
> >>>>>> Alfredo Serafini
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJPcy+pAAoJEATpPYSyaoIkouwH/imt4ERphKHGc6tXrkLQIFWJ
> TclWGjyCjoT1GgOr2OGjwfTS9xmcbsn3mYwfv+tuxNj2FfXfi4OfoVza6z7tZeUZ
> WdH4+cmq+4Lg+7lt+Pbt2narYWhvUCg2Dths8tdj8nPtJSEEd2KfW5DQqnwq/CfA
> uqOAN5zEb9rsy5gTGzSNxX66fpnM1t7XWHs2gmoD17rfmnJEQBc3l+a6rnLJdnFX
> vABg2gEiYt5YGaZRG4V1oVC5SqEoZlysix/tkZyWcFMvXN+nvePbMDhaqBwjWc5k
> 719uf4gW66Xf7V8zeWgwQcXlNICAebyXnsGiPqkeUaZa4nhm6v+G+FT4Ho/R4lk=
> =10yV
> -----END PGP SIGNATURE-----
>

Re: stambol with italian language

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've seen this happen loading custom vocabularies built by the Generic RDF Indexer and I'm honestly still not sure of why. In my case, restarting the custom bundle and the Solr Yard bundle seemed to make it work. I imagine that restarting Stanbol would do the same. Perhaps there is some subtle error in the building of the custom bundle that makes it possible for a Solr index service to be created but not started?

As to managing configuration, you may want to follow:

https://issues.apache.org/jira/browse/STANBOL-529

which offers a future way to provide configuration at startup. I'm not familiar enough with the Sling Launcher system to know how difficult it would be to directly expose deployment via REST, but it might be more feasible using the Apache Felix Web Console which is normally included in Stanbol builds:

http://felix.apache.org/site/web-console-restful-api.html
http://felix.apache.org/site/apache-felix-web-console.html#ApacheFelixWebConsole-RESTfulAPI

- ---
A. Soroka
Software & Systems Engineering :: Online Library Environment
the University of Virginia Library

On Mar 28, 2012, at 11:24 AM, seralf wrote:

> yes i have already started the bundle, but if i search from the web
> interface or via a command line like:
> curl -X POST -d "name=roma*&limit=10&offset=0"
> http://localhost:8080/entityhub/site/<SITE-NAME>/find
> 
> i have the error i pasted.
> 
> Any suggestions? maybe i miss some configuration step?
> 
> 2012/3/28 Michel Benevento <mb...@kanker.nl>
> 
>> Have you started your installed bundle in the admin console? Click the
>> little triangle next to it so it becomes a square and the status message
>> updates.
>> 
>> Michel
>> 
>> 
>> On 28 mrt. 2012, at 17:02, seralf wrote:
>> 
>>> Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
>>> I've done the solr indexes as in the tutorial and they seems to be ok (i
>>> looked inside them with Luke), i've copied them in ROOT/sling/dataset,
>> and
>>> then installed the generated bundle via the console.
>>> 
>>> Now i have a strange error: seems like stanbol is not actually load my
>>> indexes, or for some reason it has not activated the yard
>>> 
>>> java.lang.IllegalStateException: Unable to initialize the Cache with Yard
>>>> <SITE-NAME> Index! This is usually caused by Errors while reading the
>> Cache
>>>> Configuration from the Yard.
>>>>   at
>>>> 
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>>>   at
>>>> 
>> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
>>>>   ...
>>>> Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException:
>>>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
>> currently
>>>> not active!
>>>>   ...
>>>> 
>>> 
>>> does anyone has suggestion on this?
>>> 
>>> i have two other related questions:
>>> 1) how can i start stanbol with specific config activated?
>>> 2) is there any way to manage the deploy/activation via some kind of rest
>>> interface? (for example curl? it could be helpful for doing some
>>> automatization... )
>>> 
>>> thanks in advance,
>>> Alfredo
>>> 
>>> 
>>> 
>>> 
>>> 2012/3/22 seralf <se...@gmail.com>
>>> 
>>>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
>>>> 
>>>> i think i'll try to follow your suggestion, and try to use my thesaurus
>>>> with the workflow option 2)
>>>> i already use solr either, so it's probably the best choice for my
>> needs,
>>>> indeed
>>>> 
>>>> on the other hand i'm still interested on give a try on opennlp italian
>>>> model construction, but i can to my experiments externally, as i correct
>>>> understand.
>>>> 
>>>> thanks very much, i'll try to make some progress
>>>> Alfredo
>>>> 
>>>> 
>>>> 
>>>> 2012/3/22 Rupert Westenthaler <ru...@gmail.com>
>>>> 
>>>>> Hi Alfredo
>>>>> 
>>>>> On 22.03.2012, at 12:24, seralf wrote:
>>>>> 
>>>>>> Hi i'm new to stambol, i'm reading the documentation and examples, and
>>>>> i'd
>>>>>> like to start some testing with it on italian language, if it's
>>>>> possible.
>>>>>> 
>>>>>> Could someone give me some hint regarding the steps to try to costruct
>>>>> my
>>>>>> model (Italian) and configure it inside the platform? I suppose it's
>>>>>> possible and it should be not very far to the steps taken for
>> construct
>>>>>> -let's say- the Spanish integration.
>>>>>> What i need to do? I know it could sound a very generic question, but
>>>>> it's
>>>>>> not so clear from the documentation, so i need help.
>>>>>> For my test i would like to be able to use a text corpora from the
>>>>> database
>>>>>> of a client, and a skos thesaurus from the same domain.
>>>>>> 
>>>>>> thanks in advance for every help (suggestions, code examples, ideas,
>>>>> etc)
>>>>>> 
>>>>> 
>>>>> In principle there are two different workflows how to extract Entities
>>>>> form Text
>>>>> 
>>>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
>>>>> (2) KeywordLinking [5]
>>>>> 
>>>>> 
>>>>> (1) requires a OpenNLP [1] NER model for the language of your
>> documents.
>>>>> However currently there are no models for the italian language
>> distributed
>>>>> by OpenNLP. This would require you to build your own models. For more
>>>>> information on how to do that please see the documentation of OpenNLP
>> [1].
>>>>> As soon as you have such models you need only copy them into the
>>>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
>>>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
>> "it-ner.location.bin"
>>>>> for the model that detects locations for italian) Stanbol will pick
>> them up
>>>>> automatically.
>>>>> 
>>>>> (2) directly matches words of the text with labels of entities within
>> the
>>>>> controlled vocabulary. This process can be improved by Natural Langauge
>>>>> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
>>>>> Typically this works fine for datasets that contain named entities
>> such as
>>>>> concepts of an thesaurus; contacts of an company, projects, products …
>> It
>>>>> does not work well with datasets that contains entities with labels
>> that
>>>>> are also used as common words in the given language as this will
>> result in
>>>>> a lot of false positives.
>>>>> 
>>>>> Based on the information you provided on you use case I suggest that
>> (2)
>>>>> should work just fine for you. This user scenario [2] should provide
>> you
>>>>> will all the needed information on how to configure Stanbol for your
>> use
>>>>> case.
>>>>> 
>>>>> I hope this helps. If you have any further questions feel free to ask
>>>>> 
>>>>> best
>>>>> Rupert Westenthaler
>>>>> 
>>>>> [1] http://opennlp.apache.org/
>>>>> [2]
>> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>>>>> 
>>>>> [3]
>>>>> 
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
>>>>> [4]
>>>>> 
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
>>>>> [5]
>>>>> 
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>>>>> 
>>>>>> cheers,
>>>>>> Alfredo Serafini
>>>>> 
>>>>> 
>>>> 
>> 
>> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPcy+pAAoJEATpPYSyaoIkouwH/imt4ERphKHGc6tXrkLQIFWJ
TclWGjyCjoT1GgOr2OGjwfTS9xmcbsn3mYwfv+tuxNj2FfXfi4OfoVza6z7tZeUZ
WdH4+cmq+4Lg+7lt+Pbt2narYWhvUCg2Dths8tdj8nPtJSEEd2KfW5DQqnwq/CfA
uqOAN5zEb9rsy5gTGzSNxX66fpnM1t7XWHs2gmoD17rfmnJEQBc3l+a6rnLJdnFX
vABg2gEiYt5YGaZRG4V1oVC5SqEoZlysix/tkZyWcFMvXN+nvePbMDhaqBwjWc5k
719uf4gW66Xf7V8zeWgwQcXlNICAebyXnsGiPqkeUaZa4nhm6v+G+FT4Ho/R4lk=
=10yV
-----END PGP SIGNATURE-----

Re: stambol with italian language

Posted by seralf <se...@gmail.com>.
yes i have already started the bundle, but if i search from the web
interface or via a command line like:
curl -X POST -d "name=roma*&limit=10&offset=0"
http://localhost:8080/entityhub/site/<SITE-NAME>/find

i have the error i pasted.

Any suggestions? maybe i miss some configuration step?

2012/3/28 Michel Benevento <mb...@kanker.nl>

> Have you started your installed bundle in the admin console? Click the
> little triangle next to it so it becomes a square and the status message
> updates.
>
> Michel
>
>
> On 28 mrt. 2012, at 17:02, seralf wrote:
>
> > Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
> > I've done the solr indexes as in the tutorial and they seems to be ok (i
> > looked inside them with Luke), i've copied them in ROOT/sling/dataset,
> and
> > then installed the generated bundle via the console.
> >
> > Now i have a strange error: seems like stanbol is not actually load my
> > indexes, or for some reason it has not activated the yard
> >
> > java.lang.IllegalStateException: Unable to initialize the Cache with Yard
> >> <SITE-NAME> Index! This is usually caused by Errors while reading the
> Cache
> >> Configuration from the Yard.
> >>    at
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >>    at
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
> >>    ...
> >> Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException:
> >> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is
> currently
> >> not active!
> >>    ...
> >>
> >
> > does anyone has suggestion on this?
> >
> > i have two other related questions:
> > 1) how can i start stanbol with specific config activated?
> > 2) is there any way to manage the deploy/activation via some kind of rest
> > interface? (for example curl? it could be helpful for doing some
> > automatization... )
> >
> > thanks in advance,
> > Alfredo
> >
> >
> >
> >
> > 2012/3/22 seralf <se...@gmail.com>
> >
> >> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
> >>
> >> i think i'll try to follow your suggestion, and try to use my thesaurus
> >> with the workflow option 2)
> >> i already use solr either, so it's probably the best choice for my
> needs,
> >> indeed
> >>
> >> on the other hand i'm still interested on give a try on opennlp italian
> >> model construction, but i can to my experiments externally, as i correct
> >> understand.
> >>
> >> thanks very much, i'll try to make some progress
> >> Alfredo
> >>
> >>
> >>
> >> 2012/3/22 Rupert Westenthaler <ru...@gmail.com>
> >>
> >>> Hi Alfredo
> >>>
> >>> On 22.03.2012, at 12:24, seralf wrote:
> >>>
> >>>> Hi i'm new to stambol, i'm reading the documentation and examples, and
> >>> i'd
> >>>> like to start some testing with it on italian language, if it's
> >>> possible.
> >>>>
> >>>> Could someone give me some hint regarding the steps to try to costruct
> >>> my
> >>>> model (Italian) and configure it inside the platform? I suppose it's
> >>>> possible and it should be not very far to the steps taken for
> construct
> >>>> -let's say- the Spanish integration.
> >>>> What i need to do? I know it could sound a very generic question, but
> >>> it's
> >>>> not so clear from the documentation, so i need help.
> >>>> For my test i would like to be able to use a text corpora from the
> >>> database
> >>>> of a client, and a skos thesaurus from the same domain.
> >>>>
> >>>> thanks in advance for every help (suggestions, code examples, ideas,
> >>> etc)
> >>>>
> >>>
> >>> In principle there are two different workflows how to extract Entities
> >>> form Text
> >>>
> >>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
> >>> (2) KeywordLinking [5]
> >>>
> >>>
> >>> (1) requires a OpenNLP [1] NER model for the language of your
> documents.
> >>> However currently there are no models for the italian language
> distributed
> >>> by OpenNLP. This would require you to build your own models. For more
> >>> information on how to do that please see the documentation of OpenNLP
> [1].
> >>> As soon as you have such models you need only copy them into the
> >>> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
> >>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g.
> "it-ner.location.bin"
> >>> for the model that detects locations for italian) Stanbol will pick
> them up
> >>> automatically.
> >>>
> >>> (2) directly matches words of the text with labels of entities within
> the
> >>> controlled vocabulary. This process can be improved by Natural Langauge
> >>> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
> >>> Typically this works fine for datasets that contain named entities
> such as
> >>> concepts of an thesaurus; contacts of an company, projects, products …
> It
> >>> does not work well with datasets that contains entities with labels
> that
> >>> are also used as common words in the given language as this will
> result in
> >>> a lot of false positives.
> >>>
> >>> Based on the information you provided on you use case I suggest that
> (2)
> >>> should work just fine for you. This user scenario [2] should provide
> you
> >>> will all the needed information on how to configure Stanbol for your
> use
> >>> case.
> >>>
> >>> I hope this helps. If you have any further questions feel free to ask
> >>>
> >>> best
> >>> Rupert Westenthaler
> >>>
> >>> [1] http://opennlp.apache.org/
> >>> [2]
> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
> >>>
> >>> [3]
> >>>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
> >>> [4]
> >>>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
> >>> [5]
> >>>
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
> >>>
> >>>> cheers,
> >>>> Alfredo Serafini
> >>>
> >>>
> >>
>
>

Re: stambol with italian language

Posted by Michel Benevento <mb...@kanker.nl>.
Have you started your installed bundle in the admin console? Click the little triangle next to it so it becomes a square and the status message updates. 

Michel


On 28 mrt. 2012, at 17:02, seralf wrote:

> Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
> I've done the solr indexes as in the tutorial and they seems to be ok (i
> looked inside them with Luke), i've copied them in ROOT/sling/dataset, and
> then installed the generated bundle via the console.
> 
> Now i have a strange error: seems like stanbol is not actually load my
> indexes, or for some reason it has not activated the yard
> 
> java.lang.IllegalStateException: Unable to initialize the Cache with Yard
>> <SITE-NAME> Index! This is usually caused by Errors while reading the Cache
>> Configuration from the Yard.
>>    at
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>    at
>> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
>>    ...
>> Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException:
>> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is currently
>> not active!
>>    ...
>> 
> 
> does anyone has suggestion on this?
> 
> i have two other related questions:
> 1) how can i start stanbol with specific config activated?
> 2) is there any way to manage the deploy/activation via some kind of rest
> interface? (for example curl? it could be helpful for doing some
> automatization... )
> 
> thanks in advance,
> Alfredo
> 
> 
> 
> 
> 2012/3/22 seralf <se...@gmail.com>
> 
>> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
>> 
>> i think i'll try to follow your suggestion, and try to use my thesaurus
>> with the workflow option 2)
>> i already use solr either, so it's probably the best choice for my needs,
>> indeed
>> 
>> on the other hand i'm still interested on give a try on opennlp italian
>> model construction, but i can to my experiments externally, as i correct
>> understand.
>> 
>> thanks very much, i'll try to make some progress
>> Alfredo
>> 
>> 
>> 
>> 2012/3/22 Rupert Westenthaler <ru...@gmail.com>
>> 
>>> Hi Alfredo
>>> 
>>> On 22.03.2012, at 12:24, seralf wrote:
>>> 
>>>> Hi i'm new to stambol, i'm reading the documentation and examples, and
>>> i'd
>>>> like to start some testing with it on italian language, if it's
>>> possible.
>>>> 
>>>> Could someone give me some hint regarding the steps to try to costruct
>>> my
>>>> model (Italian) and configure it inside the platform? I suppose it's
>>>> possible and it should be not very far to the steps taken for construct
>>>> -let's say- the Spanish integration.
>>>> What i need to do? I know it could sound a very generic question, but
>>> it's
>>>> not so clear from the documentation, so i need help.
>>>> For my test i would like to be able to use a text corpora from the
>>> database
>>>> of a client, and a skos thesaurus from the same domain.
>>>> 
>>>> thanks in advance for every help (suggestions, code examples, ideas,
>>> etc)
>>>> 
>>> 
>>> In principle there are two different workflows how to extract Entities
>>> form Text
>>> 
>>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
>>> (2) KeywordLinking [5]
>>> 
>>> 
>>> (1) requires a OpenNLP [1] NER model for the language of your documents.
>>> However currently there are no models for the italian language distributed
>>> by OpenNLP. This would require you to build your own models. For more
>>> information on how to do that please see the documentation of OpenNLP [1].
>>> As soon as you have such models you need only copy them into the
>>> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
>>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g. "it-ner.location.bin"
>>> for the model that detects locations for italian) Stanbol will pick them up
>>> automatically.
>>> 
>>> (2) directly matches words of the text with labels of entities within the
>>> controlled vocabulary. This process can be improved by Natural Langauge
>>> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
>>> Typically this works fine for datasets that contain named entities such as
>>> concepts of an thesaurus; contacts of an company, projects, products … It
>>> does not work well with datasets that contains entities with labels that
>>> are also used as common words in the given language as this will result in
>>> a lot of false positives.
>>> 
>>> Based on the information you provided on you use case I suggest that (2)
>>> should work just fine for you. This user scenario [2] should provide you
>>> will all the needed information on how to configure Stanbol for your use
>>> case.
>>> 
>>> I hope this helps. If you have any further questions feel free to ask
>>> 
>>> best
>>> Rupert Westenthaler
>>> 
>>> [1] http://opennlp.apache.org/
>>> [2] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>>> 
>>> [3]
>>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
>>> [4]
>>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
>>> [5]
>>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>>> 
>>>> cheers,
>>>> Alfredo Serafini
>>> 
>>> 
>> 


Re: stambol with italian language

Posted by seralf <se...@gmail.com>.
Hi i'm trying to use the KeywordLinking as Rupert suggested me earlier.
I've done the solr indexes as in the tutorial and they seems to be ok (i
looked inside them with Luke), i've copied them in ROOT/sling/dataset, and
then installed the generated bundle via the console.

Now i have a strange error: seems like stanbol is not actually load my
indexes, or for some reason it has not activated the yard

java.lang.IllegalStateException: Unable to initialize the Cache with Yard
> <SITE-NAME> Index! This is usually caused by Errors while reading the Cache
> Configuration from the Yard.
>     at
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>     at
> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:331)
>     ...
> Caused by: org.apache.stanbol.entityhub.servicesapi.yard.YardException:
> The SolrIndex '<SITE-NAME>' for SolrYard '<SITE-NAME> Index' is currently
> not active!
>     ...
>

does anyone has suggestion on this?

i have two other related questions:
1) how can i start stanbol with specific config activated?
2) is there any way to manage the deploy/activation via some kind of rest
interface? (for example curl? it could be helpful for doing some
automatization... )

thanks in advance,
Alfredo




2012/3/22 seralf <se...@gmail.com>

> Thanks very much Rupert, you help me a lot in clarify my ideas :-)
>
> i think i'll try to follow your suggestion, and try to use my thesaurus
> with the workflow option 2)
> i already use solr either, so it's probably the best choice for my needs,
> indeed
>
> on the other hand i'm still interested on give a try on opennlp italian
> model construction, but i can to my experiments externally, as i correct
> understand.
>
> thanks very much, i'll try to make some progress
> Alfredo
>
>
>
> 2012/3/22 Rupert Westenthaler <ru...@gmail.com>
>
>> Hi Alfredo
>>
>> On 22.03.2012, at 12:24, seralf wrote:
>>
>> > Hi i'm new to stambol, i'm reading the documentation and examples, and
>> i'd
>> > like to start some testing with it on italian language, if it's
>> possible.
>> >
>> > Could someone give me some hint regarding the steps to try to costruct
>> my
>> > model (Italian) and configure it inside the platform? I suppose it's
>> > possible and it should be not very far to the steps taken for construct
>> > -let's say- the Spanish integration.
>> > What i need to do? I know it could sound a very generic question, but
>> it's
>> > not so clear from the documentation, so i need help.
>> > For my test i would like to be able to use a text corpora from the
>> database
>> > of a client, and a skos thesaurus from the same domain.
>> >
>> > thanks in advance for every help (suggestions, code examples, ideas,
>> etc)
>> >
>>
>> In principle there are two different workflows how to extract Entities
>> form Text
>>
>> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
>> (2) KeywordLinking [5]
>>
>>
>> (1) requires a OpenNLP [1] NER model for the language of your documents.
>> However currently there are no models for the italian language distributed
>> by OpenNLP. This would require you to build your own models. For more
>> information on how to do that please see the documentation of OpenNLP [1].
>> As soon as you have such models you need only copy them into the
>> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
>> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g. "it-ner.location.bin"
>> for the model that detects locations for italian) Stanbol will pick them up
>> automatically.
>>
>> (2) directly matches words of the text with labels of entities within the
>> controlled vocabulary. This process can be improved by Natural Langauge
>> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
>> Typically this works fine for datasets that contain named entities such as
>> concepts of an thesaurus; contacts of an company, projects, products … It
>> does not work well with datasets that contains entities with labels that
>> are also used as common words in the given language as this will result in
>> a lot of false positives.
>>
>> Based on the information you provided on you use case I suggest that (2)
>> should work just fine for you. This user scenario [2] should provide you
>> will all the needed information on how to configure Stanbol for your use
>> case.
>>
>> I hope this helps. If you have any further questions feel free to ask
>>
>> best
>> Rupert Westenthaler
>>
>> [1] http://opennlp.apache.org/
>> [2] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>>
>> [3]
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
>> [4]
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
>> [5]
>> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>>
>> > cheers,
>> > Alfredo Serafini
>>
>>
>

Re: stambol with italian language

Posted by seralf <se...@gmail.com>.
Thanks very much Rupert, you help me a lot in clarify my ideas :-)

i think i'll try to follow your suggestion, and try to use my thesaurus
with the workflow option 2)
i already use solr either, so it's probably the best choice for my needs,
indeed

on the other hand i'm still interested on give a try on opennlp italian
model construction, but i can to my experiments externally, as i correct
understand.

thanks very much, i'll try to make some progress
Alfredo


2012/3/22 Rupert Westenthaler <ru...@gmail.com>

> Hi Alfredo
>
> On 22.03.2012, at 12:24, seralf wrote:
>
> > Hi i'm new to stambol, i'm reading the documentation and examples, and
> i'd
> > like to start some testing with it on italian language, if it's possible.
> >
> > Could someone give me some hint regarding the steps to try to costruct my
> > model (Italian) and configure it inside the platform? I suppose it's
> > possible and it should be not very far to the steps taken for construct
> > -let's say- the Spanish integration.
> > What i need to do? I know it could sound a very generic question, but
> it's
> > not so clear from the documentation, so i need help.
> > For my test i would like to be able to use a text corpora from the
> database
> > of a client, and a skos thesaurus from the same domain.
> >
> > thanks in advance for every help (suggestions, code examples, ideas, etc)
> >
>
> In principle there are two different workflows how to extract Entities
> form Text
>
> (1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
> (2) KeywordLinking [5]
>
>
> (1) requires a OpenNLP [1] NER model for the language of your documents.
> However currently there are no models for the italian language distributed
> by OpenNLP. This would require you to build your own models. For more
> information on how to do that please see the documentation of OpenNLP [1].
> As soon as you have such models you need only copy them into the
> {stanbol-workingdir}/sling/datafiles folder. If they follow the naming
> scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g. "it-ner.location.bin"
> for the model that detects locations for italian) Stanbol will pick them up
> automatically.
>
> (2) directly matches words of the text with labels of entities within the
> controlled vocabulary. This process can be improved by Natural Langauge
> Processing (e.g. Part-of-Speech tagging) but this is not a requirement.
> Typically this works fine for datasets that contain named entities such as
> concepts of an thesaurus; contacts of an company, projects, products … It
> does not work well with datasets that contains entities with labels that
> are also used as common words in the given language as this will result in
> a lot of false positives.
>
> Based on the information you provided on you use case I suggest that (2)
> should work just fine for you. This user scenario [2] should provide you
> will all the needed information on how to configure Stanbol for your use
> case.
>
> I hope this helps. If you have any further questions feel free to ask
>
> best
> Rupert Westenthaler
>
> [1] http://opennlp.apache.org/
> [2] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>
> [3]
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
> [4]
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
> [5]
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
>
> > cheers,
> > Alfredo Serafini
>
>

Re: stambol with italian language

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Alfredo

On 22.03.2012, at 12:24, seralf wrote:

> Hi i'm new to stambol, i'm reading the documentation and examples, and i'd
> like to start some testing with it on italian language, if it's possible.
> 
> Could someone give me some hint regarding the steps to try to costruct my
> model (Italian) and configure it inside the platform? I suppose it's
> possible and it should be not very far to the steps taken for construct
> -let's say- the Spanish integration.
> What i need to do? I know it could sound a very generic question, but it's
> not so clear from the documentation, so i need help.
> For my test i would like to be able to use a text corpora from the database
> of a client, and a skos thesaurus from the same domain.
> 
> thanks in advance for every help (suggestions, code examples, ideas, etc)
> 

In principle there are two different workflows how to extract Entities form Text

(1) NamedEntityExtraction (NER) [3] => NamedEntityLinking [4]
(2) KeywordLinking [5]


(1) requires a OpenNLP [1] NER model for the language of your documents. However currently there are no models for the italian language distributed by OpenNLP. This would require you to build your own models. For more information on how to do that please see the documentation of OpenNLP [1]. As soon as you have such models you need only copy them into the {stanbol-workingdir}/sling/datafiles folder. If they follow the naming scheme used by OpenNLP ("{lang}-ner-{type}.bin" e.g. "it-ner.location.bin" for the model that detects locations for italian) Stanbol will pick them up automatically. 

(2) directly matches words of the text with labels of entities within the controlled vocabulary. This process can be improved by Natural Langauge Processing (e.g. Part-of-Speech tagging) but this is not a requirement. Typically this works fine for datasets that contain named entities such as concepts of an thesaurus; contacts of an company, projects, products … It does not work well with datasets that contains entities with labels that are also used as common words in the given language as this will result in a lot of false positives. 

Based on the information you provided on you use case I suggest that (2) should work just fine for you. This user scenario [2] should provide you will all the needed information on how to configure Stanbol for your use case.

I hope this helps. If you have any further questions feel free to ask

best
Rupert Westenthaler

[1] http://opennlp.apache.org/
[2] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html

[3] http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.html
[4] http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.html
[5] http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html

> cheers,
> Alfredo Serafini