You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by David Riccitelli <da...@interact.it> on 2011/06/06 10:29:03 UTC

EntityHub and DBpedia

Dears,

Before r1131053, when querying for resource
http://dbpedia.org/resource/Valentino_Rossi, I was able to get:

   -     <j.8:thumbnail rdf:resource="
   http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Valentino_Rossi_2010_Qatar.jpg/200px-Valentino_Rossi_2010_Qatar.jpg
   "/>
   -     <j.3:depiction rdf:resource="
   http://upload.wikimedia.org/wikipedia/commons/8/81/Valentino_Rossi_2010_Qatar.jpg
   "/>

being:

   -     xmlns:j.3="http://xmlns.com/foaf/0.1/"
   -     xmlns:j.8="http://dbpedia.org/ontology/"


With r1131053 however these information are not provided anymore: see the
attached file for a comparison.

I understand the configuration of EntityHub changed, how do I get those
information back from EntityHub?

BR,
David

-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Ok,

Cleaned, rebuilt, relaunched, now I can see it. I'll proceed ...

Thanks!
David

On Wed, Jul 27, 2011 at 9:33 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> On Wed, Jul 27, 2011 at 7:54 AM, David Riccitelli
> <da...@interact.it> wrote:
> > Hi Rupert,
> > One question, I can't find this bundle:
> >
> >   - "Apache Stanbol Data: DBpedia.org defaultdata version
> >   (org.apache.stanbol.data.sites.dbpedia.default)"
> >
> >
> > But I see this one:
> >
> >   - "Apache Stanbol Default Data (org.apache.stanbol.defaultdata)"
> >
> > Are you referring to the latter?
>
> No definitely not. This bundle was removed and should no longer be around.
>
> Typically this happens if you update the launcher jar without deleting
> the the /sling folder.
> because within the /sling folder there is a cache with the config and
> the current versions of the bundles.
> If only the launcher jar is replaced, than sling will still startup by
> using this caches and will NOT use the updated bundles and
> configuration from the launcher jar.
>
> Can you please check this?
>
> best
> Rupert
>
> >
> > BR
> > David
> >
> > On Thu, Jul 21, 2011 at 7:10 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi David, all
> >>
> >> With the changes from yesterday (revision r1148947 and r1148948) it is
> >> now easily possible to deactivate the default configuration for
> >> dbPedia provided by the Stanbol launcher and to replace it with the
> >> one the uses the remote services with a local cache.
> >>
> >> Steps:
> >>
> >> 1. use the current launcher
> >> 2. go to the Bundle tab of the Apache Felix Webconsole
> >> 3. stop the Bundle "Apache Stanbol Data: DBpedia.org defaultdata
> >> version (org.apache.stanbol.data.sites.dbpedia.default)"
> >> 4. install and start the Bundle "Apache Stanbol Data: Remote
> >> DBpedia.org with local cache
> >> (org.apache.stanbol.data.sites.dbpedia.cached)". You can find this
> >> bundle in "{stanbol-trunk}/data/sites/dbpediacached".
> >>
> >> best
> >> Rupert
> >>
> >> On Mon, Jul 18, 2011 at 1:15 PM, Rupert Westenthaler
> >> <ru...@gmail.com> wrote:
> >> > Hi
> >> >
> >> > Rather than working on the Workaround I decided to invest some time in
> >> > finishing STANBOL-140 and implementing STANBOL-287.
> >> > Together with the proposal made in [1] to split up the default data in
> >> > several bundles this should solve the issues described/discussed here.
> >> >
> >> > best
> >> > Rupert
> >> >
> >> > [1] http://markmail.org/message/bf7qurmzos45h23b
> >> >
> >> > On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler
> >> > <ru...@gmail.com> wrote:
> >> >> On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
> >> >> <da...@interact.it> wrote:
> >> >>> Thanks Rupert,
> >> >>>
> >> >>> A description on how to do this is available in [1].
> >> >>>
> >> >>>
> >> >>> I can't see the [1] :-)
> >> >>
> >> >> does this count as missing attachment? ^^
> >> >>
> >> >> [1]
> >>
> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/
> >> >>
> >> >>>
> >> >>> David
> >> >>>
> >> >>> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
> >> >>> rupert.westenthaler@gmail.com> wrote:
> >> >>>
> >> >>>> Hi
> >> >>>>
> >> >>>> Yes this is possible, but would need (depending on the hardware)
> quite
> >> >>>> some time.
> >> >>>> A description on how to do this is available in [1].
> >> >>>>
> >> >>>> Instead of installing the dbpedia.solrindex.zip file as described
> in
> >> >>>> the readme, you could directly
> >> >>>>
> >> >>>> * shutdown stanbol
> >> >>>> * delete the "dbpedia_43k" index in
> >> >>>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
> >> >>>> * copy the index located in the
> >> >>>> "{indexing-root}/indexing/destination/indexes" to
> >> >>>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
> >> >>>> "dbpedia_43k"
> >> >>>> * restart stanbol.
> >> >>>>
> >> >>>> After that Stanbol should use the new index.
> >> >>>>
> >> >>>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
> >> >>>> than changing the value of "Solr Index/Core" in the configuration
> of
> >> >>>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should
> also
> >> >>>> work.
> >> >>>>
> >> >>>> best
> >> >>>> Rupert
> >> >>>>
> >> >>>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
> >> >>>> <da...@interact.it> wrote:
> >> >>>> > Hi,
> >> >>>> >
> >> >>>> > As another workaround, I was thinking that I could actually
> generate
> >> >>>> locally
> >> >>>> > the DBpedia index with all the data using the dumps (
> >> >>>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
> >> >>>> dbpedia_43k.
> >> >>>> >
> >> >>>> > What do you think?
> >> >>>> >
> >> >>>> > Thanks,
> >> >>>> > David
> >> >>>> >
> >> >>>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
> >> >>>> > rupert.westenthaler@gmail.com> wrote:
> >> >>>> >
> >> >>>> >> Hi
> >> >>>> >>
> >> >>>> >> I will try to find some time in the evening to reproduce this.
> >> >>>> >>
> >> >>>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
> >> >>>> >> <da...@interact.it> wrote:
> >> >>>> >> > Thanks Rupert,
> >> >>>> >> >
> >> >>>> >> > I'm trying to follow your instructions but I encounter a
> couple
> >> of
> >> >>>> issues
> >> >>>> >> > (probably due to inexperience):
> >> >>>> >> >  [1] when dropping the config files, they enter some loop of
> >> >>>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the
> >> FileInstall
> >> >>>> >> > bundle), is that normal?
> >> >>>> >>
> >> >>>> >> This is very strange and should not be caused by the
> FileInstaller.
> >> >>>> >> Maybe there is some loop between the Sling Installer - trying to
> >> >>>> >> install the default configuration and the FileInstaller that may
> >> cause
> >> >>>> >> this under some circumstances.
> >> >>>> >>
> >> >>>> >> >  [2] after I restart Stanbol, and try to query an entity from
> the
> >> >>>> >> entityhub
> >> >>>> >> > I receive the following error:
> >> >>>> >> >
> >> >>>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> >> >>>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
> >> >>>> >> > (java.lang.IllegalStateException: Unable to initialize the
> Cache
> >> with
> >> >>>> >> Yard
> >> >>>> >> > dbpediaCache! This is usually caused by Errors while reading
> the
> >> Cache
> >> >>>> >> > Configuration from the Yard.) java.lang.IllegalStateException:
> >> Unable
> >> >>>> to
> >> >>>> >> > initialize the Cache with Yard dbpediaCache! This is usually
> >> caused by
> >> >>>> >> > Errors while reading the Cache Configuration from the Yard.
> >> >>>> >> > at
> >> >>>> >> >
> >> >>>> >>
> >> >>>>
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> > Do I need to initialize the Cache in some way?
> >> >>>> >> >
> >> >>>> >> No it does not. Prepared in Indexes do include a document that
> >> >>>> >> provides a list of the indexed fields. In future this may be
> used
> >> to
> >> >>>> >> determine if a query can be successfully executed on the local
> >> index
> >> >>>> >> or not. In addition this is used in case an Entity within the
> index
> >> is
> >> >>>> >> updated with an newer version.
> >> >>>> >> However this configuration is optional and is not required. This
> >> >>>> >> Exception should only appear if the document is present but
> illegal
> >> >>>> >> formatted. However the SolrYard initialized for the dbpediaCache
> >> >>>> >> should be empty.
> >> >>>> >>
> >> >>>> >> Therefore I think it is somehow related to the above problem of
> >> >>>> >> overriding configurations.
> >> >>>> >>
> >> >>>> >> In general the way how the default configuration is loaded is
> >> >>>> >> sub-optional in the moment. Especially using a single
> defaultdata
> >> >>>> >> bundle for both the OpenNLP models and the dbpedia configuration
> +
> >> >>>> >> default index was not a good Idea, because one can not
> >> exclude/change
> >> >>>> >> the dbpedia stuff without affecting other components that depend
> on
> >> >>>> >> OpenNLP.
> >> >>>> >> Therefore I think we need to discuss how to better structure the
> >> >>>> >> configurations and data needed to run stanbol.
> >> >>>> >>
> >> >>>> >> There is also an other issue that the SolrYard only once copies
> >> >>>> >> provided indexes and does not check for updates. This would it
> make
> >> >>>> >> hard the upgrade from the small index provided with the default
> >> data
> >> >>>> >> to a bigger version.
> >> >>>> >>
> >> >>>> >> Both this things are related to the problems and need to be
> >> addressed
> >> >>>> >> before the first stanbol release. Independent of those I will
> try
> >> to
> >> >>>> >> find a simple solution for what you intend to do.
> >> >>>> >>
> >> >>>> >> In the meantime I suggest you go for the initially proposed
> >> workaround.
> >> >>>> >>
> >> >>>> >> best
> >> >>>> >> Rupert Westenthaler
> >> >>>> >>
> >> >>>> >> > Thanks for your help,
> >> >>>> >> >
> >> >>>> >> > David
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> >> >>>> >> > rupert.westenthaler@gmail.com> wrote:
> >> >>>> >> >
> >> >>>> >> >> Hi
> >> >>>> >> >>
> >> >>>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> >> >>>> >> >> <nu...@cs.unibo.it> wrote:
> >> >>>> >> >> > I solved in the same way, but loosing the caching
> >> capabilities.
> >> >>>> >> >> > Is there any possibility to keep both all the data and the
> >> cache?
> >> >>>> >> >> >
> >> >>>> >> >> > Andrea
> >> >>>> >> >> >
> >> >>>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >> >>>> >> >> >
> >> >>>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for
> >> me.
> >> >>>> >> >> >>
> >> >>>> >> >> >> Thanks,
> >> >>>> >> >> >> David
> >> >>>> >> >> >>
> >> >>>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >> >>>> >> >> >> david.riccitelli@interact.it> wrote:
> >> >>>> >> >> >>
> >> >>>> >> >> >>> Hi Rupert,
> >> >>>> >> >> >>>
> >> >>>> >> >> >>> I recently updated the Stanbol install, and I found that
> the
> >> RDF
> >> >>>> >> >> returned
> >> >>>> >> >> >>> by the EntityHub is missing some props (specifically the
> >> dbprop
> >> >>>> as
> >> >>>> >> far
> >> >>>> >> >> as I
> >> >>>> >> >> >>> can see).
> >> >>>> >> >> >>>
> >> >>>> >> >> >>> This is the command that I use for testing:
> >> >>>> >> >> >>> curl -H "accept: application/rdf+xml" "
> >> >>>> >> >> >>>
> >> >>>> >> >>
> >> >>>> >>
> >> >>>>
> >>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> >> >>>> >> >> >>> "
> >> >>>> >> >> >>>
> >> >>>> >> >> >>> which outputs the attached RDF file.
> >> >>>> >> >> >>>
> >> >>>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and
> checked
> >> the
> >> >>>> >> with
> >> >>>> >> >> the
> >> >>>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >> >>>> >> >> >>>
> >> >>>> >> >> >>> Does this depend on the mapping.txt file?
> >> >>>> >> >> >>>
> >> >>>> >> >>
> >> >>>> >> >> If you plan to create your own dbpedia index, than the
> >> mapping.txt
> >> >>>> >> >> file would be the way how to configure what properties are
> >> >>>> >> >> includes/excluded.
> >> >>>> >> >> Typically dbprop values are low quality. They are just naive
> 1:1
> >> >>>> >> >> mappings of key value pairs as found in the info boxes.
> Because
> >> of
> >> >>>> >> >> this they are excluded from the indexes.
> >> >>>> >> >>
> >> >>>> >> >> At runtime the returned data depend on the used Cache
> strategy:
> >> >>>> >> >>
> >> >>>> >> >> Currently there are three possibilities (configured with the
> >> >>>> referenced
> >> >>>> >> >> Site)
> >> >>>> >> >> 1) no cache: bot queries and retrieval so use a remote
> service
> >> >>>> >> >> 2) used: Queries are executed by the remote service.
> Retrieved
> >> >>>> >> >> Entities are stored locally. The cached data depend on the
> >> mappings
> >> >>>> >> >> defined for the cache.
> >> >>>> >> >> 3) all: Both queries and retrieval are based on the cache.
> The
> >> remote
> >> >>>> >> >> service are only used as fallback in the case that the cache
> is
> >> not
> >> >>>> >> >> available (e.g. if you deactivate solrYard).
> >> >>>> >> >>
> >> >>>> >> >> So if you you are fine with (2) than you could use the
> >> configuration
> >> >>>> >> >> as previously used by the stable launcher [1].
> >> >>>> >> >> I think the easiest way to install this is to use this is to
> add
> >> the
> >> >>>> >> >> Felix File Installer [2] to the Stanbol Environment. You will
> >> need to
> >> >>>> >> >> delete the current referencedSite for dbpedia first and than
> add
> >> the
> >> >>>> >> >> three configuration files as described by [1].
> >> >>>> >> >>
> >> >>>> >> >> If your requirements are not covered by the currently
> available
> >> >>>> option
> >> >>>> >> >> it would be nice if you could write a short user story,
> because
> >> I am
> >> >>>> >> >> thinking about how to improve this feature and input like
> that
> >> would
> >> >>>> >> >> be really valuable.
> >> >>>> >> >>
> >> >>>> >> >> best
> >> >>>> >> >> Rupert Westenthaler
> >> >>>> >> >>
> >> >>>> >> >> [1] The dbpedia config consists of three files. the
> referenced
> >> site,
> >> >>>> >> >> cache and solryard components with the "-dbpedia" endings.
> >> >>>> >> >>
> >> >>>> >> >>
> >> >>>> >>
> >> >>>>
> >>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
> >> >>>> >> >>
> >> >>>> >> >> [2]
> http://felix.apache.org/site/apache-felix-file-install.html
> >> >>>> >> >>
> >> >>>> >> >> p.s. I keep this part because it describes very well how the
> >> cache
> >> >>>> >> >> strategy "used" work:
> >> >>>> >> >> >>>>> Hi David
> >> >>>> >> >> >>>>>
> >> >>>> >> >> >>>>> Assuming that you are using the default distribution of
> >> Apache
> >> >>>> >> >> Stanbol.
> >> >>>> >> >> >>>>>
> >> >>>> >> >> >>>>> Requests for
> http://dbpedia.org/resource/Valentino_Rossiwill
> >> >>>> be
> >> >>>> >> >> >>>>> - only the first time answered by retrieving the Entity
> >> form
> >> >>>> >> >> DBpedia.org
> >> >>>> >> >> >>>>> - the Information are cached in a local cache. By that
> >> values
> >> >>>> of
> >> >>>> >> the
> >> >>>> >> >> >>>>> documents are filtered (see (a) for details)
> >> >>>> >> >> >>>>> - the cached version is returned
> >> >>>> >> >> >>>>>
> >> >>>> >> >> >>>>> (a) The default configuration for dbpedia stores all
> >> fields
> >> >>>> >> however
> >> >>>> >> >> >>>>> filters values for literals so that only values with
> the
> >> >>>> language
> >> >>>> >> >> "en,
> >> >>>> >> >> >>>>> de, fr, it, es" or no language are stored.
> >> >>>> >> >> >>>>>
> >> >>>> >> >> >>>>>
> >> >>>> >> >> >>>>> Assuming that you have started for zero when updating
> to a
> >> new
> >> >>>> >> >> version
> >> >>>> >> >> >>>>> this also means that you have downloaded a new version
> of
> >> this
> >> >>>> >> Entity
> >> >>>> >> >> >>>>> from dbPedia.
> >> >>>> >> >> >>>>>
> >> >>>> >> >>
> >> >>>> >> >> --
> >> >>>> >> >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >> >>>> >> >> | Bodenlehenstraße 11
> >> ++43-699-11108907
> >> >>>> >> >> | A-5500 Bischofshofen
> >> >>>> >> >>
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> > --
> >> >>>> >> > David Riccitelli
> >> >>>> >> >
> >> >>>> >> > Interact SpA
> >> >>>> >> > Via A. Bargoni 78 (scala F)
> >> >>>> >> > 00153 Roma
> >> >>>> >> >
> >> >>>> >> > T +39 06 58318 301
> >> >>>> >> > F +39 06 58318 303
> >> >>>> >> >
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>
> >> >>>> >> --
> >> >>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >>>> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >>>> >> | A-5500 Bischofshofen
> >> >>>> >>
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> > --
> >> >>>> > David Riccitelli
> >> >>>> >
> >> >>>> > Interact SpA
> >> >>>> > Via A. Bargoni 78 (scala F)
> >> >>>> > 00153 Roma
> >> >>>> >
> >> >>>> > T +39 06 58318 301
> >> >>>> > F +39 06 58318 303
> >> >>>> >
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >>>> | A-5500 Bischofshofen
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> David Riccitelli
> >> >>>
> >> >>> Interact SpA
> >> >>> Via A. Bargoni 78 (scala F)
> >> >>> 00153 Roma
> >> >>>
> >> >>> T +39 06 58318 301
> >> >>> F +39 06 58318 303
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> > | Bodenlehenstraße 11                             ++43-699-11108907
> >> > | A-5500 Bischofshofen
> >> >
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > David Riccitelli
> > -----
> > Skype: ziodave
> > Twitter: @ziodave
> > LinkedIn: http://it.linkedin.com/in/riccitelli
> > -----
> > Interact SpA
> > Via A. Bargoni 78 (scala F)
> > 00153 Roma
> >
> > T +39 06 58318 301
> > F +39 06 58318 303
> >
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli
-----
Skype: ziodave
Twitter: @ziodave
LinkedIn: http://it.linkedin.com/in/riccitelli
-----
Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Wed, Jul 27, 2011 at 7:54 AM, David Riccitelli
<da...@interact.it> wrote:
> Hi Rupert,
> One question, I can't find this bundle:
>
>   - "Apache Stanbol Data: DBpedia.org defaultdata version
>   (org.apache.stanbol.data.sites.dbpedia.default)"
>
>
> But I see this one:
>
>   - "Apache Stanbol Default Data (org.apache.stanbol.defaultdata)"
>
> Are you referring to the latter?

No definitely not. This bundle was removed and should no longer be around.

Typically this happens if you update the launcher jar without deleting
the the /sling folder.
because within the /sling folder there is a cache with the config and
the current versions of the bundles.
If only the launcher jar is replaced, than sling will still startup by
using this caches and will NOT use the updated bundles and
configuration from the launcher jar.

Can you please check this?

best
Rupert

>
> BR
> David
>
> On Thu, Jul 21, 2011 at 7:10 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi David, all
>>
>> With the changes from yesterday (revision r1148947 and r1148948) it is
>> now easily possible to deactivate the default configuration for
>> dbPedia provided by the Stanbol launcher and to replace it with the
>> one the uses the remote services with a local cache.
>>
>> Steps:
>>
>> 1. use the current launcher
>> 2. go to the Bundle tab of the Apache Felix Webconsole
>> 3. stop the Bundle "Apache Stanbol Data: DBpedia.org defaultdata
>> version (org.apache.stanbol.data.sites.dbpedia.default)"
>> 4. install and start the Bundle "Apache Stanbol Data: Remote
>> DBpedia.org with local cache
>> (org.apache.stanbol.data.sites.dbpedia.cached)". You can find this
>> bundle in "{stanbol-trunk}/data/sites/dbpediacached".
>>
>> best
>> Rupert
>>
>> On Mon, Jul 18, 2011 at 1:15 PM, Rupert Westenthaler
>> <ru...@gmail.com> wrote:
>> > Hi
>> >
>> > Rather than working on the Workaround I decided to invest some time in
>> > finishing STANBOL-140 and implementing STANBOL-287.
>> > Together with the proposal made in [1] to split up the default data in
>> > several bundles this should solve the issues described/discussed here.
>> >
>> > best
>> > Rupert
>> >
>> > [1] http://markmail.org/message/bf7qurmzos45h23b
>> >
>> > On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler
>> > <ru...@gmail.com> wrote:
>> >> On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
>> >> <da...@interact.it> wrote:
>> >>> Thanks Rupert,
>> >>>
>> >>> A description on how to do this is available in [1].
>> >>>
>> >>>
>> >>> I can't see the [1] :-)
>> >>
>> >> does this count as missing attachment? ^^
>> >>
>> >> [1]
>> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/
>> >>
>> >>>
>> >>> David
>> >>>
>> >>> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
>> >>> rupert.westenthaler@gmail.com> wrote:
>> >>>
>> >>>> Hi
>> >>>>
>> >>>> Yes this is possible, but would need (depending on the hardware) quite
>> >>>> some time.
>> >>>> A description on how to do this is available in [1].
>> >>>>
>> >>>> Instead of installing the dbpedia.solrindex.zip file as described in
>> >>>> the readme, you could directly
>> >>>>
>> >>>> * shutdown stanbol
>> >>>> * delete the "dbpedia_43k" index in
>> >>>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
>> >>>> * copy the index located in the
>> >>>> "{indexing-root}/indexing/destination/indexes" to
>> >>>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
>> >>>> "dbpedia_43k"
>> >>>> * restart stanbol.
>> >>>>
>> >>>> After that Stanbol should use the new index.
>> >>>>
>> >>>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
>> >>>> than changing the value of "Solr Index/Core" in the configuration of
>> >>>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
>> >>>> work.
>> >>>>
>> >>>> best
>> >>>> Rupert
>> >>>>
>> >>>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
>> >>>> <da...@interact.it> wrote:
>> >>>> > Hi,
>> >>>> >
>> >>>> > As another workaround, I was thinking that I could actually generate
>> >>>> locally
>> >>>> > the DBpedia index with all the data using the dumps (
>> >>>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
>> >>>> dbpedia_43k.
>> >>>> >
>> >>>> > What do you think?
>> >>>> >
>> >>>> > Thanks,
>> >>>> > David
>> >>>> >
>> >>>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
>> >>>> > rupert.westenthaler@gmail.com> wrote:
>> >>>> >
>> >>>> >> Hi
>> >>>> >>
>> >>>> >> I will try to find some time in the evening to reproduce this.
>> >>>> >>
>> >>>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>> >>>> >> <da...@interact.it> wrote:
>> >>>> >> > Thanks Rupert,
>> >>>> >> >
>> >>>> >> > I'm trying to follow your instructions but I encounter a couple
>> of
>> >>>> issues
>> >>>> >> > (probably due to inexperience):
>> >>>> >> >  [1] when dropping the config files, they enter some loop of
>> >>>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the
>> FileInstall
>> >>>> >> > bundle), is that normal?
>> >>>> >>
>> >>>> >> This is very strange and should not be caused by the FileInstaller.
>> >>>> >> Maybe there is some loop between the Sling Installer - trying to
>> >>>> >> install the default configuration and the FileInstaller that may
>> cause
>> >>>> >> this under some circumstances.
>> >>>> >>
>> >>>> >> >  [2] after I restart Stanbol, and try to query an entity from the
>> >>>> >> entityhub
>> >>>> >> > I receive the following error:
>> >>>> >> >
>> >>>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>> >>>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
>> >>>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache
>> with
>> >>>> >> Yard
>> >>>> >> > dbpediaCache! This is usually caused by Errors while reading the
>> Cache
>> >>>> >> > Configuration from the Yard.) java.lang.IllegalStateException:
>> Unable
>> >>>> to
>> >>>> >> > initialize the Cache with Yard dbpediaCache! This is usually
>> caused by
>> >>>> >> > Errors while reading the Cache Configuration from the Yard.
>> >>>> >> > at
>> >>>> >> >
>> >>>> >>
>> >>>>
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Do I need to initialize the Cache in some way?
>> >>>> >> >
>> >>>> >> No it does not. Prepared in Indexes do include a document that
>> >>>> >> provides a list of the indexed fields. In future this may be used
>> to
>> >>>> >> determine if a query can be successfully executed on the local
>> index
>> >>>> >> or not. In addition this is used in case an Entity within the index
>> is
>> >>>> >> updated with an newer version.
>> >>>> >> However this configuration is optional and is not required. This
>> >>>> >> Exception should only appear if the document is present but illegal
>> >>>> >> formatted. However the SolrYard initialized for the dbpediaCache
>> >>>> >> should be empty.
>> >>>> >>
>> >>>> >> Therefore I think it is somehow related to the above problem of
>> >>>> >> overriding configurations.
>> >>>> >>
>> >>>> >> In general the way how the default configuration is loaded is
>> >>>> >> sub-optional in the moment. Especially using a single defaultdata
>> >>>> >> bundle for both the OpenNLP models and the dbpedia configuration +
>> >>>> >> default index was not a good Idea, because one can not
>> exclude/change
>> >>>> >> the dbpedia stuff without affecting other components that depend on
>> >>>> >> OpenNLP.
>> >>>> >> Therefore I think we need to discuss how to better structure the
>> >>>> >> configurations and data needed to run stanbol.
>> >>>> >>
>> >>>> >> There is also an other issue that the SolrYard only once copies
>> >>>> >> provided indexes and does not check for updates. This would it make
>> >>>> >> hard the upgrade from the small index provided with the default
>> data
>> >>>> >> to a bigger version.
>> >>>> >>
>> >>>> >> Both this things are related to the problems and need to be
>> addressed
>> >>>> >> before the first stanbol release. Independent of those I will try
>> to
>> >>>> >> find a simple solution for what you intend to do.
>> >>>> >>
>> >>>> >> In the meantime I suggest you go for the initially proposed
>> workaround.
>> >>>> >>
>> >>>> >> best
>> >>>> >> Rupert Westenthaler
>> >>>> >>
>> >>>> >> > Thanks for your help,
>> >>>> >> >
>> >>>> >> > David
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>> >>>> >> > rupert.westenthaler@gmail.com> wrote:
>> >>>> >> >
>> >>>> >> >> Hi
>> >>>> >> >>
>> >>>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>> >>>> >> >> <nu...@cs.unibo.it> wrote:
>> >>>> >> >> > I solved in the same way, but loosing the caching
>> capabilities.
>> >>>> >> >> > Is there any possibility to keep both all the data and the
>> cache?
>> >>>> >> >> >
>> >>>> >> >> > Andrea
>> >>>> >> >> >
>> >>>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>> >>>> >> >> >
>> >>>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for
>> me.
>> >>>> >> >> >>
>> >>>> >> >> >> Thanks,
>> >>>> >> >> >> David
>> >>>> >> >> >>
>> >>>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>> >>>> >> >> >> david.riccitelli@interact.it> wrote:
>> >>>> >> >> >>
>> >>>> >> >> >>> Hi Rupert,
>> >>>> >> >> >>>
>> >>>> >> >> >>> I recently updated the Stanbol install, and I found that the
>> RDF
>> >>>> >> >> returned
>> >>>> >> >> >>> by the EntityHub is missing some props (specifically the
>> dbprop
>> >>>> as
>> >>>> >> far
>> >>>> >> >> as I
>> >>>> >> >> >>> can see).
>> >>>> >> >> >>>
>> >>>> >> >> >>> This is the command that I use for testing:
>> >>>> >> >> >>> curl -H "accept: application/rdf+xml" "
>> >>>> >> >> >>>
>> >>>> >> >>
>> >>>> >>
>> >>>>
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>> >>>> >> >> >>> "
>> >>>> >> >> >>>
>> >>>> >> >> >>> which outputs the attached RDF file.
>> >>>> >> >> >>>
>> >>>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked
>> the
>> >>>> >> with
>> >>>> >> >> the
>> >>>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>> >>>> >> >> >>>
>> >>>> >> >> >>> Does this depend on the mapping.txt file?
>> >>>> >> >> >>>
>> >>>> >> >>
>> >>>> >> >> If you plan to create your own dbpedia index, than the
>> mapping.txt
>> >>>> >> >> file would be the way how to configure what properties are
>> >>>> >> >> includes/excluded.
>> >>>> >> >> Typically dbprop values are low quality. They are just naive 1:1
>> >>>> >> >> mappings of key value pairs as found in the info boxes. Because
>> of
>> >>>> >> >> this they are excluded from the indexes.
>> >>>> >> >>
>> >>>> >> >> At runtime the returned data depend on the used Cache strategy:
>> >>>> >> >>
>> >>>> >> >> Currently there are three possibilities (configured with the
>> >>>> referenced
>> >>>> >> >> Site)
>> >>>> >> >> 1) no cache: bot queries and retrieval so use a remote service
>> >>>> >> >> 2) used: Queries are executed by the remote service. Retrieved
>> >>>> >> >> Entities are stored locally. The cached data depend on the
>> mappings
>> >>>> >> >> defined for the cache.
>> >>>> >> >> 3) all: Both queries and retrieval are based on the cache. The
>> remote
>> >>>> >> >> service are only used as fallback in the case that the cache is
>> not
>> >>>> >> >> available (e.g. if you deactivate solrYard).
>> >>>> >> >>
>> >>>> >> >> So if you you are fine with (2) than you could use the
>> configuration
>> >>>> >> >> as previously used by the stable launcher [1].
>> >>>> >> >> I think the easiest way to install this is to use this is to add
>> the
>> >>>> >> >> Felix File Installer [2] to the Stanbol Environment. You will
>> need to
>> >>>> >> >> delete the current referencedSite for dbpedia first and than add
>> the
>> >>>> >> >> three configuration files as described by [1].
>> >>>> >> >>
>> >>>> >> >> If your requirements are not covered by the currently available
>> >>>> option
>> >>>> >> >> it would be nice if you could write a short user story, because
>> I am
>> >>>> >> >> thinking about how to improve this feature and input like that
>> would
>> >>>> >> >> be really valuable.
>> >>>> >> >>
>> >>>> >> >> best
>> >>>> >> >> Rupert Westenthaler
>> >>>> >> >>
>> >>>> >> >> [1] The dbpedia config consists of three files. the referenced
>> site,
>> >>>> >> >> cache and solryard components with the "-dbpedia" endings.
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >>
>> >>>>
>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>> >>>> >> >>
>> >>>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>> >>>> >> >>
>> >>>> >> >> p.s. I keep this part because it describes very well how the
>> cache
>> >>>> >> >> strategy "used" work:
>> >>>> >> >> >>>>> Hi David
>> >>>> >> >> >>>>>
>> >>>> >> >> >>>>> Assuming that you are using the default distribution of
>> Apache
>> >>>> >> >> Stanbol.
>> >>>> >> >> >>>>>
>> >>>> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossiwill
>> >>>> be
>> >>>> >> >> >>>>> - only the first time answered by retrieving the Entity
>> form
>> >>>> >> >> DBpedia.org
>> >>>> >> >> >>>>> - the Information are cached in a local cache. By that
>> values
>> >>>> of
>> >>>> >> the
>> >>>> >> >> >>>>> documents are filtered (see (a) for details)
>> >>>> >> >> >>>>> - the cached version is returned
>> >>>> >> >> >>>>>
>> >>>> >> >> >>>>> (a) The default configuration for dbpedia stores all
>> fields
>> >>>> >> however
>> >>>> >> >> >>>>> filters values for literals so that only values with the
>> >>>> language
>> >>>> >> >> "en,
>> >>>> >> >> >>>>> de, fr, it, es" or no language are stored.
>> >>>> >> >> >>>>>
>> >>>> >> >> >>>>>
>> >>>> >> >> >>>>> Assuming that you have started for zero when updating to a
>> new
>> >>>> >> >> version
>> >>>> >> >> >>>>> this also means that you have downloaded a new version of
>> this
>> >>>> >> Entity
>> >>>> >> >> >>>>> from dbPedia.
>> >>>> >> >> >>>>>
>> >>>> >> >>
>> >>>> >> >> --
>> >>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>>> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >>>> >> >> | A-5500 Bischofshofen
>> >>>> >> >>
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > --
>> >>>> >> > David Riccitelli
>> >>>> >> >
>> >>>> >> > Interact SpA
>> >>>> >> > Via A. Bargoni 78 (scala F)
>> >>>> >> > 00153 Roma
>> >>>> >> >
>> >>>> >> > T +39 06 58318 301
>> >>>> >> > F +39 06 58318 303
>> >>>> >> >
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> --
>> >>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>>> >> | A-5500 Bischofshofen
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > David Riccitelli
>> >>>> >
>> >>>> > Interact SpA
>> >>>> > Via A. Bargoni 78 (scala F)
>> >>>> > 00153 Roma
>> >>>> >
>> >>>> > T +39 06 58318 301
>> >>>> > F +39 06 58318 303
>> >>>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>>> | A-5500 Bischofshofen
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> David Riccitelli
>> >>>
>> >>> Interact SpA
>> >>> Via A. Bargoni 78 (scala F)
>> >>> 00153 Roma
>> >>>
>> >>> T +39 06 58318 301
>> >>> F +39 06 58318 303
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>> >
>> >
>> >
>> > --
>> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> > | Bodenlehenstraße 11                             ++43-699-11108907
>> > | A-5500 Bischofshofen
>> >
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
> -----
> Skype: ziodave
> Twitter: @ziodave
> LinkedIn: http://it.linkedin.com/in/riccitelli
> -----
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Hi Rupert,
One question, I can't find this bundle:

   - "Apache Stanbol Data: DBpedia.org defaultdata version
   (org.apache.stanbol.data.sites.dbpedia.default)"


But I see this one:

   - "Apache Stanbol Default Data (org.apache.stanbol.defaultdata)"

Are you referring to the latter?

BR
David

On Thu, Jul 21, 2011 at 7:10 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi David, all
>
> With the changes from yesterday (revision r1148947 and r1148948) it is
> now easily possible to deactivate the default configuration for
> dbPedia provided by the Stanbol launcher and to replace it with the
> one the uses the remote services with a local cache.
>
> Steps:
>
> 1. use the current launcher
> 2. go to the Bundle tab of the Apache Felix Webconsole
> 3. stop the Bundle "Apache Stanbol Data: DBpedia.org defaultdata
> version (org.apache.stanbol.data.sites.dbpedia.default)"
> 4. install and start the Bundle "Apache Stanbol Data: Remote
> DBpedia.org with local cache
> (org.apache.stanbol.data.sites.dbpedia.cached)". You can find this
> bundle in "{stanbol-trunk}/data/sites/dbpediacached".
>
> best
> Rupert
>
> On Mon, Jul 18, 2011 at 1:15 PM, Rupert Westenthaler
> <ru...@gmail.com> wrote:
> > Hi
> >
> > Rather than working on the Workaround I decided to invest some time in
> > finishing STANBOL-140 and implementing STANBOL-287.
> > Together with the proposal made in [1] to split up the default data in
> > several bundles this should solve the issues described/discussed here.
> >
> > best
> > Rupert
> >
> > [1] http://markmail.org/message/bf7qurmzos45h23b
> >
> > On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler
> > <ru...@gmail.com> wrote:
> >> On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
> >> <da...@interact.it> wrote:
> >>> Thanks Rupert,
> >>>
> >>> A description on how to do this is available in [1].
> >>>
> >>>
> >>> I can't see the [1] :-)
> >>
> >> does this count as missing attachment? ^^
> >>
> >> [1]
> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/
> >>
> >>>
> >>> David
> >>>
> >>> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
> >>> rupert.westenthaler@gmail.com> wrote:
> >>>
> >>>> Hi
> >>>>
> >>>> Yes this is possible, but would need (depending on the hardware) quite
> >>>> some time.
> >>>> A description on how to do this is available in [1].
> >>>>
> >>>> Instead of installing the dbpedia.solrindex.zip file as described in
> >>>> the readme, you could directly
> >>>>
> >>>> * shutdown stanbol
> >>>> * delete the "dbpedia_43k" index in
> >>>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
> >>>> * copy the index located in the
> >>>> "{indexing-root}/indexing/destination/indexes" to
> >>>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
> >>>> "dbpedia_43k"
> >>>> * restart stanbol.
> >>>>
> >>>> After that Stanbol should use the new index.
> >>>>
> >>>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
> >>>> than changing the value of "Solr Index/Core" in the configuration of
> >>>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
> >>>> work.
> >>>>
> >>>> best
> >>>> Rupert
> >>>>
> >>>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
> >>>> <da...@interact.it> wrote:
> >>>> > Hi,
> >>>> >
> >>>> > As another workaround, I was thinking that I could actually generate
> >>>> locally
> >>>> > the DBpedia index with all the data using the dumps (
> >>>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
> >>>> dbpedia_43k.
> >>>> >
> >>>> > What do you think?
> >>>> >
> >>>> > Thanks,
> >>>> > David
> >>>> >
> >>>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
> >>>> > rupert.westenthaler@gmail.com> wrote:
> >>>> >
> >>>> >> Hi
> >>>> >>
> >>>> >> I will try to find some time in the evening to reproduce this.
> >>>> >>
> >>>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
> >>>> >> <da...@interact.it> wrote:
> >>>> >> > Thanks Rupert,
> >>>> >> >
> >>>> >> > I'm trying to follow your instructions but I encounter a couple
> of
> >>>> issues
> >>>> >> > (probably due to inexperience):
> >>>> >> >  [1] when dropping the config files, they enter some loop of
> >>>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the
> FileInstall
> >>>> >> > bundle), is that normal?
> >>>> >>
> >>>> >> This is very strange and should not be caused by the FileInstaller.
> >>>> >> Maybe there is some loop between the Sling Installer - trying to
> >>>> >> install the default configuration and the FileInstaller that may
> cause
> >>>> >> this under some circumstances.
> >>>> >>
> >>>> >> >  [2] after I restart Stanbol, and try to query an entity from the
> >>>> >> entityhub
> >>>> >> > I receive the following error:
> >>>> >> >
> >>>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> >>>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
> >>>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache
> with
> >>>> >> Yard
> >>>> >> > dbpediaCache! This is usually caused by Errors while reading the
> Cache
> >>>> >> > Configuration from the Yard.) java.lang.IllegalStateException:
> Unable
> >>>> to
> >>>> >> > initialize the Cache with Yard dbpediaCache! This is usually
> caused by
> >>>> >> > Errors while reading the Cache Configuration from the Yard.
> >>>> >> > at
> >>>> >> >
> >>>> >>
> >>>>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >>>> >> >
> >>>> >> >
> >>>> >> > Do I need to initialize the Cache in some way?
> >>>> >> >
> >>>> >> No it does not. Prepared in Indexes do include a document that
> >>>> >> provides a list of the indexed fields. In future this may be used
> to
> >>>> >> determine if a query can be successfully executed on the local
> index
> >>>> >> or not. In addition this is used in case an Entity within the index
> is
> >>>> >> updated with an newer version.
> >>>> >> However this configuration is optional and is not required. This
> >>>> >> Exception should only appear if the document is present but illegal
> >>>> >> formatted. However the SolrYard initialized for the dbpediaCache
> >>>> >> should be empty.
> >>>> >>
> >>>> >> Therefore I think it is somehow related to the above problem of
> >>>> >> overriding configurations.
> >>>> >>
> >>>> >> In general the way how the default configuration is loaded is
> >>>> >> sub-optional in the moment. Especially using a single defaultdata
> >>>> >> bundle for both the OpenNLP models and the dbpedia configuration +
> >>>> >> default index was not a good Idea, because one can not
> exclude/change
> >>>> >> the dbpedia stuff without affecting other components that depend on
> >>>> >> OpenNLP.
> >>>> >> Therefore I think we need to discuss how to better structure the
> >>>> >> configurations and data needed to run stanbol.
> >>>> >>
> >>>> >> There is also an other issue that the SolrYard only once copies
> >>>> >> provided indexes and does not check for updates. This would it make
> >>>> >> hard the upgrade from the small index provided with the default
> data
> >>>> >> to a bigger version.
> >>>> >>
> >>>> >> Both this things are related to the problems and need to be
> addressed
> >>>> >> before the first stanbol release. Independent of those I will try
> to
> >>>> >> find a simple solution for what you intend to do.
> >>>> >>
> >>>> >> In the meantime I suggest you go for the initially proposed
> workaround.
> >>>> >>
> >>>> >> best
> >>>> >> Rupert Westenthaler
> >>>> >>
> >>>> >> > Thanks for your help,
> >>>> >> >
> >>>> >> > David
> >>>> >> >
> >>>> >> >
> >>>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> >>>> >> > rupert.westenthaler@gmail.com> wrote:
> >>>> >> >
> >>>> >> >> Hi
> >>>> >> >>
> >>>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> >>>> >> >> <nu...@cs.unibo.it> wrote:
> >>>> >> >> > I solved in the same way, but loosing the caching
> capabilities.
> >>>> >> >> > Is there any possibility to keep both all the data and the
> cache?
> >>>> >> >> >
> >>>> >> >> > Andrea
> >>>> >> >> >
> >>>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >>>> >> >> >
> >>>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for
> me.
> >>>> >> >> >>
> >>>> >> >> >> Thanks,
> >>>> >> >> >> David
> >>>> >> >> >>
> >>>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >>>> >> >> >> david.riccitelli@interact.it> wrote:
> >>>> >> >> >>
> >>>> >> >> >>> Hi Rupert,
> >>>> >> >> >>>
> >>>> >> >> >>> I recently updated the Stanbol install, and I found that the
> RDF
> >>>> >> >> returned
> >>>> >> >> >>> by the EntityHub is missing some props (specifically the
> dbprop
> >>>> as
> >>>> >> far
> >>>> >> >> as I
> >>>> >> >> >>> can see).
> >>>> >> >> >>>
> >>>> >> >> >>> This is the command that I use for testing:
> >>>> >> >> >>> curl -H "accept: application/rdf+xml" "
> >>>> >> >> >>>
> >>>> >> >>
> >>>> >>
> >>>>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> >>>> >> >> >>> "
> >>>> >> >> >>>
> >>>> >> >> >>> which outputs the attached RDF file.
> >>>> >> >> >>>
> >>>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked
> the
> >>>> >> with
> >>>> >> >> the
> >>>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >>>> >> >> >>>
> >>>> >> >> >>> Does this depend on the mapping.txt file?
> >>>> >> >> >>>
> >>>> >> >>
> >>>> >> >> If you plan to create your own dbpedia index, than the
> mapping.txt
> >>>> >> >> file would be the way how to configure what properties are
> >>>> >> >> includes/excluded.
> >>>> >> >> Typically dbprop values are low quality. They are just naive 1:1
> >>>> >> >> mappings of key value pairs as found in the info boxes. Because
> of
> >>>> >> >> this they are excluded from the indexes.
> >>>> >> >>
> >>>> >> >> At runtime the returned data depend on the used Cache strategy:
> >>>> >> >>
> >>>> >> >> Currently there are three possibilities (configured with the
> >>>> referenced
> >>>> >> >> Site)
> >>>> >> >> 1) no cache: bot queries and retrieval so use a remote service
> >>>> >> >> 2) used: Queries are executed by the remote service. Retrieved
> >>>> >> >> Entities are stored locally. The cached data depend on the
> mappings
> >>>> >> >> defined for the cache.
> >>>> >> >> 3) all: Both queries and retrieval are based on the cache. The
> remote
> >>>> >> >> service are only used as fallback in the case that the cache is
> not
> >>>> >> >> available (e.g. if you deactivate solrYard).
> >>>> >> >>
> >>>> >> >> So if you you are fine with (2) than you could use the
> configuration
> >>>> >> >> as previously used by the stable launcher [1].
> >>>> >> >> I think the easiest way to install this is to use this is to add
> the
> >>>> >> >> Felix File Installer [2] to the Stanbol Environment. You will
> need to
> >>>> >> >> delete the current referencedSite for dbpedia first and than add
> the
> >>>> >> >> three configuration files as described by [1].
> >>>> >> >>
> >>>> >> >> If your requirements are not covered by the currently available
> >>>> option
> >>>> >> >> it would be nice if you could write a short user story, because
> I am
> >>>> >> >> thinking about how to improve this feature and input like that
> would
> >>>> >> >> be really valuable.
> >>>> >> >>
> >>>> >> >> best
> >>>> >> >> Rupert Westenthaler
> >>>> >> >>
> >>>> >> >> [1] The dbpedia config consists of three files. the referenced
> site,
> >>>> >> >> cache and solryard components with the "-dbpedia" endings.
> >>>> >> >>
> >>>> >> >>
> >>>> >>
> >>>>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
> >>>> >> >>
> >>>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
> >>>> >> >>
> >>>> >> >> p.s. I keep this part because it describes very well how the
> cache
> >>>> >> >> strategy "used" work:
> >>>> >> >> >>>>> Hi David
> >>>> >> >> >>>>>
> >>>> >> >> >>>>> Assuming that you are using the default distribution of
> Apache
> >>>> >> >> Stanbol.
> >>>> >> >> >>>>>
> >>>> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossiwill
> >>>> be
> >>>> >> >> >>>>> - only the first time answered by retrieving the Entity
> form
> >>>> >> >> DBpedia.org
> >>>> >> >> >>>>> - the Information are cached in a local cache. By that
> values
> >>>> of
> >>>> >> the
> >>>> >> >> >>>>> documents are filtered (see (a) for details)
> >>>> >> >> >>>>> - the cached version is returned
> >>>> >> >> >>>>>
> >>>> >> >> >>>>> (a) The default configuration for dbpedia stores all
> fields
> >>>> >> however
> >>>> >> >> >>>>> filters values for literals so that only values with the
> >>>> language
> >>>> >> >> "en,
> >>>> >> >> >>>>> de, fr, it, es" or no language are stored.
> >>>> >> >> >>>>>
> >>>> >> >> >>>>>
> >>>> >> >> >>>>> Assuming that you have started for zero when updating to a
> new
> >>>> >> >> version
> >>>> >> >> >>>>> this also means that you have downloaded a new version of
> this
> >>>> >> Entity
> >>>> >> >> >>>>> from dbPedia.
> >>>> >> >> >>>>>
> >>>> >> >>
> >>>> >> >> --
> >>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >>>> >> >> | A-5500 Bischofshofen
> >>>> >> >>
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > --
> >>>> >> > David Riccitelli
> >>>> >> >
> >>>> >> > Interact SpA
> >>>> >> > Via A. Bargoni 78 (scala F)
> >>>> >> > 00153 Roma
> >>>> >> >
> >>>> >> > T +39 06 58318 301
> >>>> >> > F +39 06 58318 303
> >>>> >> >
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >>>> >> | A-5500 Bischofshofen
> >>>> >>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > David Riccitelli
> >>>> >
> >>>> > Interact SpA
> >>>> > Via A. Bargoni 78 (scala F)
> >>>> > 00153 Roma
> >>>> >
> >>>> > T +39 06 58318 301
> >>>> > F +39 06 58318 303
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>>> | A-5500 Bischofshofen
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> David Riccitelli
> >>>
> >>> Interact SpA
> >>> Via A. Bargoni 78 (scala F)
> >>> 00153 Roma
> >>>
> >>> T +39 06 58318 301
> >>> F +39 06 58318 303
> >>>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > | Bodenlehenstraße 11                             ++43-699-11108907
> > | A-5500 Bischofshofen
> >
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli
-----
Skype: ziodave
Twitter: @ziodave
LinkedIn: http://it.linkedin.com/in/riccitelli
-----
Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi David, all

With the changes from yesterday (revision r1148947 and r1148948) it is
now easily possible to deactivate the default configuration for
dbPedia provided by the Stanbol launcher and to replace it with the
one the uses the remote services with a local cache.

Steps:

1. use the current launcher
2. go to the Bundle tab of the Apache Felix Webconsole
3. stop the Bundle "Apache Stanbol Data: DBpedia.org defaultdata
version (org.apache.stanbol.data.sites.dbpedia.default)"
4. install and start the Bundle "Apache Stanbol Data: Remote
DBpedia.org with local cache
(org.apache.stanbol.data.sites.dbpedia.cached)". You can find this
bundle in "{stanbol-trunk}/data/sites/dbpediacached".

best
Rupert

On Mon, Jul 18, 2011 at 1:15 PM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi
>
> Rather than working on the Workaround I decided to invest some time in
> finishing STANBOL-140 and implementing STANBOL-287.
> Together with the proposal made in [1] to split up the default data in
> several bundles this should solve the issues described/discussed here.
>
> best
> Rupert
>
> [1] http://markmail.org/message/bf7qurmzos45h23b
>
> On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler
> <ru...@gmail.com> wrote:
>> On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
>> <da...@interact.it> wrote:
>>> Thanks Rupert,
>>>
>>> A description on how to do this is available in [1].
>>>
>>>
>>> I can't see the [1] :-)
>>
>> does this count as missing attachment? ^^
>>
>> [1] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/
>>
>>>
>>> David
>>>
>>> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Yes this is possible, but would need (depending on the hardware) quite
>>>> some time.
>>>> A description on how to do this is available in [1].
>>>>
>>>> Instead of installing the dbpedia.solrindex.zip file as described in
>>>> the readme, you could directly
>>>>
>>>> * shutdown stanbol
>>>> * delete the "dbpedia_43k" index in
>>>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
>>>> * copy the index located in the
>>>> "{indexing-root}/indexing/destination/indexes" to
>>>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
>>>> "dbpedia_43k"
>>>> * restart stanbol.
>>>>
>>>> After that Stanbol should use the new index.
>>>>
>>>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
>>>> than changing the value of "Solr Index/Core" in the configuration of
>>>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
>>>> work.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
>>>> <da...@interact.it> wrote:
>>>> > Hi,
>>>> >
>>>> > As another workaround, I was thinking that I could actually generate
>>>> locally
>>>> > the DBpedia index with all the data using the dumps (
>>>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
>>>> dbpedia_43k.
>>>> >
>>>> > What do you think?
>>>> >
>>>> > Thanks,
>>>> > David
>>>> >
>>>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
>>>> > rupert.westenthaler@gmail.com> wrote:
>>>> >
>>>> >> Hi
>>>> >>
>>>> >> I will try to find some time in the evening to reproduce this.
>>>> >>
>>>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>>>> >> <da...@interact.it> wrote:
>>>> >> > Thanks Rupert,
>>>> >> >
>>>> >> > I'm trying to follow your instructions but I encounter a couple of
>>>> issues
>>>> >> > (probably due to inexperience):
>>>> >> >  [1] when dropping the config files, they enter some loop of
>>>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
>>>> >> > bundle), is that normal?
>>>> >>
>>>> >> This is very strange and should not be caused by the FileInstaller.
>>>> >> Maybe there is some loop between the Sling Installer - trying to
>>>> >> install the default configuration and the FileInstaller that may cause
>>>> >> this under some circumstances.
>>>> >>
>>>> >> >  [2] after I restart Stanbol, and try to query an entity from the
>>>> >> entityhub
>>>> >> > I receive the following error:
>>>> >> >
>>>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>>>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
>>>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with
>>>> >> Yard
>>>> >> > dbpediaCache! This is usually caused by Errors while reading the Cache
>>>> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable
>>>> to
>>>> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by
>>>> >> > Errors while reading the Cache Configuration from the Yard.
>>>> >> > at
>>>> >> >
>>>> >>
>>>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>>> >> >
>>>> >> >
>>>> >> > Do I need to initialize the Cache in some way?
>>>> >> >
>>>> >> No it does not. Prepared in Indexes do include a document that
>>>> >> provides a list of the indexed fields. In future this may be used to
>>>> >> determine if a query can be successfully executed on the local index
>>>> >> or not. In addition this is used in case an Entity within the index is
>>>> >> updated with an newer version.
>>>> >> However this configuration is optional and is not required. This
>>>> >> Exception should only appear if the document is present but illegal
>>>> >> formatted. However the SolrYard initialized for the dbpediaCache
>>>> >> should be empty.
>>>> >>
>>>> >> Therefore I think it is somehow related to the above problem of
>>>> >> overriding configurations.
>>>> >>
>>>> >> In general the way how the default configuration is loaded is
>>>> >> sub-optional in the moment. Especially using a single defaultdata
>>>> >> bundle for both the OpenNLP models and the dbpedia configuration +
>>>> >> default index was not a good Idea, because one can not exclude/change
>>>> >> the dbpedia stuff without affecting other components that depend on
>>>> >> OpenNLP.
>>>> >> Therefore I think we need to discuss how to better structure the
>>>> >> configurations and data needed to run stanbol.
>>>> >>
>>>> >> There is also an other issue that the SolrYard only once copies
>>>> >> provided indexes and does not check for updates. This would it make
>>>> >> hard the upgrade from the small index provided with the default data
>>>> >> to a bigger version.
>>>> >>
>>>> >> Both this things are related to the problems and need to be addressed
>>>> >> before the first stanbol release. Independent of those I will try to
>>>> >> find a simple solution for what you intend to do.
>>>> >>
>>>> >> In the meantime I suggest you go for the initially proposed workaround.
>>>> >>
>>>> >> best
>>>> >> Rupert Westenthaler
>>>> >>
>>>> >> > Thanks for your help,
>>>> >> >
>>>> >> > David
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>>>> >> > rupert.westenthaler@gmail.com> wrote:
>>>> >> >
>>>> >> >> Hi
>>>> >> >>
>>>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>>>> >> >> <nu...@cs.unibo.it> wrote:
>>>> >> >> > I solved in the same way, but loosing the caching capabilities.
>>>> >> >> > Is there any possibility to keep both all the data and the cache?
>>>> >> >> >
>>>> >> >> > Andrea
>>>> >> >> >
>>>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>>>> >> >> >
>>>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>>>> >> >> >>
>>>> >> >> >> Thanks,
>>>> >> >> >> David
>>>> >> >> >>
>>>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>>>> >> >> >> david.riccitelli@interact.it> wrote:
>>>> >> >> >>
>>>> >> >> >>> Hi Rupert,
>>>> >> >> >>>
>>>> >> >> >>> I recently updated the Stanbol install, and I found that the RDF
>>>> >> >> returned
>>>> >> >> >>> by the EntityHub is missing some props (specifically the dbprop
>>>> as
>>>> >> far
>>>> >> >> as I
>>>> >> >> >>> can see).
>>>> >> >> >>>
>>>> >> >> >>> This is the command that I use for testing:
>>>> >> >> >>> curl -H "accept: application/rdf+xml" "
>>>> >> >> >>>
>>>> >> >>
>>>> >>
>>>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>>> >> >> >>> "
>>>> >> >> >>>
>>>> >> >> >>> which outputs the attached RDF file.
>>>> >> >> >>>
>>>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
>>>> >> with
>>>> >> >> the
>>>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>>>> >> >> >>>
>>>> >> >> >>> Does this depend on the mapping.txt file?
>>>> >> >> >>>
>>>> >> >>
>>>> >> >> If you plan to create your own dbpedia index, than the mapping.txt
>>>> >> >> file would be the way how to configure what properties are
>>>> >> >> includes/excluded.
>>>> >> >> Typically dbprop values are low quality. They are just naive 1:1
>>>> >> >> mappings of key value pairs as found in the info boxes. Because of
>>>> >> >> this they are excluded from the indexes.
>>>> >> >>
>>>> >> >> At runtime the returned data depend on the used Cache strategy:
>>>> >> >>
>>>> >> >> Currently there are three possibilities (configured with the
>>>> referenced
>>>> >> >> Site)
>>>> >> >> 1) no cache: bot queries and retrieval so use a remote service
>>>> >> >> 2) used: Queries are executed by the remote service. Retrieved
>>>> >> >> Entities are stored locally. The cached data depend on the mappings
>>>> >> >> defined for the cache.
>>>> >> >> 3) all: Both queries and retrieval are based on the cache. The remote
>>>> >> >> service are only used as fallback in the case that the cache is not
>>>> >> >> available (e.g. if you deactivate solrYard).
>>>> >> >>
>>>> >> >> So if you you are fine with (2) than you could use the configuration
>>>> >> >> as previously used by the stable launcher [1].
>>>> >> >> I think the easiest way to install this is to use this is to add the
>>>> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to
>>>> >> >> delete the current referencedSite for dbpedia first and than add the
>>>> >> >> three configuration files as described by [1].
>>>> >> >>
>>>> >> >> If your requirements are not covered by the currently available
>>>> option
>>>> >> >> it would be nice if you could write a short user story, because I am
>>>> >> >> thinking about how to improve this feature and input like that would
>>>> >> >> be really valuable.
>>>> >> >>
>>>> >> >> best
>>>> >> >> Rupert Westenthaler
>>>> >> >>
>>>> >> >> [1] The dbpedia config consists of three files. the referenced site,
>>>> >> >> cache and solryard components with the "-dbpedia" endings.
>>>> >> >>
>>>> >> >>
>>>> >>
>>>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>>>> >> >>
>>>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>>>> >> >>
>>>> >> >> p.s. I keep this part because it describes very well how the cache
>>>> >> >> strategy "used" work:
>>>> >> >> >>>>> Hi David
>>>> >> >> >>>>>
>>>> >> >> >>>>> Assuming that you are using the default distribution of Apache
>>>> >> >> Stanbol.
>>>> >> >> >>>>>
>>>> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will
>>>> be
>>>> >> >> >>>>> - only the first time answered by retrieving the Entity form
>>>> >> >> DBpedia.org
>>>> >> >> >>>>> - the Information are cached in a local cache. By that values
>>>> of
>>>> >> the
>>>> >> >> >>>>> documents are filtered (see (a) for details)
>>>> >> >> >>>>> - the cached version is returned
>>>> >> >> >>>>>
>>>> >> >> >>>>> (a) The default configuration for dbpedia stores all fields
>>>> >> however
>>>> >> >> >>>>> filters values for literals so that only values with the
>>>> language
>>>> >> >> "en,
>>>> >> >> >>>>> de, fr, it, es" or no language are stored.
>>>> >> >> >>>>>
>>>> >> >> >>>>>
>>>> >> >> >>>>> Assuming that you have started for zero when updating to a new
>>>> >> >> version
>>>> >> >> >>>>> this also means that you have downloaded a new version of this
>>>> >> Entity
>>>> >> >> >>>>> from dbPedia.
>>>> >> >> >>>>>
>>>> >> >>
>>>> >> >> --
>>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> >> >> | A-5500 Bischofshofen
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > David Riccitelli
>>>> >> >
>>>> >> > Interact SpA
>>>> >> > Via A. Bargoni 78 (scala F)
>>>> >> > 00153 Roma
>>>> >> >
>>>> >> > T +39 06 58318 301
>>>> >> > F +39 06 58318 303
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> >> | A-5500 Bischofshofen
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > David Riccitelli
>>>> >
>>>> > Interact SpA
>>>> > Via A. Bargoni 78 (scala F)
>>>> > 00153 Roma
>>>> >
>>>> > T +39 06 58318 301
>>>> > F +39 06 58318 303
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>>
>>> --
>>> David Riccitelli
>>>
>>> Interact SpA
>>> Via A. Bargoni 78 (scala F)
>>> 00153 Roma
>>>
>>> T +39 06 58318 301
>>> F +39 06 58318 303
>>>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

Rather than working on the Workaround I decided to invest some time in
finishing STANBOL-140 and implementing STANBOL-287.
Together with the proposal made in [1] to split up the default data in
several bundles this should solve the issues described/discussed here.

best
Rupert

[1] http://markmail.org/message/bf7qurmzos45h23b

On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
> <da...@interact.it> wrote:
>> Thanks Rupert,
>>
>> A description on how to do this is available in [1].
>>
>>
>> I can't see the [1] :-)
>
> does this count as missing attachment? ^^
>
> [1] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/
>
>>
>> David
>>
>> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Yes this is possible, but would need (depending on the hardware) quite
>>> some time.
>>> A description on how to do this is available in [1].
>>>
>>> Instead of installing the dbpedia.solrindex.zip file as described in
>>> the readme, you could directly
>>>
>>> * shutdown stanbol
>>> * delete the "dbpedia_43k" index in
>>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
>>> * copy the index located in the
>>> "{indexing-root}/indexing/destination/indexes" to
>>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
>>> "dbpedia_43k"
>>> * restart stanbol.
>>>
>>> After that Stanbol should use the new index.
>>>
>>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
>>> than changing the value of "Solr Index/Core" in the configuration of
>>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
>>> work.
>>>
>>> best
>>> Rupert
>>>
>>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
>>> <da...@interact.it> wrote:
>>> > Hi,
>>> >
>>> > As another workaround, I was thinking that I could actually generate
>>> locally
>>> > the DBpedia index with all the data using the dumps (
>>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
>>> dbpedia_43k.
>>> >
>>> > What do you think?
>>> >
>>> > Thanks,
>>> > David
>>> >
>>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
>>> > rupert.westenthaler@gmail.com> wrote:
>>> >
>>> >> Hi
>>> >>
>>> >> I will try to find some time in the evening to reproduce this.
>>> >>
>>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>>> >> <da...@interact.it> wrote:
>>> >> > Thanks Rupert,
>>> >> >
>>> >> > I'm trying to follow your instructions but I encounter a couple of
>>> issues
>>> >> > (probably due to inexperience):
>>> >> >  [1] when dropping the config files, they enter some loop of
>>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
>>> >> > bundle), is that normal?
>>> >>
>>> >> This is very strange and should not be caused by the FileInstaller.
>>> >> Maybe there is some loop between the Sling Installer - trying to
>>> >> install the default configuration and the FileInstaller that may cause
>>> >> this under some circumstances.
>>> >>
>>> >> >  [2] after I restart Stanbol, and try to query an entity from the
>>> >> entityhub
>>> >> > I receive the following error:
>>> >> >
>>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
>>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with
>>> >> Yard
>>> >> > dbpediaCache! This is usually caused by Errors while reading the Cache
>>> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable
>>> to
>>> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by
>>> >> > Errors while reading the Cache Configuration from the Yard.
>>> >> > at
>>> >> >
>>> >>
>>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>> >> >
>>> >> >
>>> >> > Do I need to initialize the Cache in some way?
>>> >> >
>>> >> No it does not. Prepared in Indexes do include a document that
>>> >> provides a list of the indexed fields. In future this may be used to
>>> >> determine if a query can be successfully executed on the local index
>>> >> or not. In addition this is used in case an Entity within the index is
>>> >> updated with an newer version.
>>> >> However this configuration is optional and is not required. This
>>> >> Exception should only appear if the document is present but illegal
>>> >> formatted. However the SolrYard initialized for the dbpediaCache
>>> >> should be empty.
>>> >>
>>> >> Therefore I think it is somehow related to the above problem of
>>> >> overriding configurations.
>>> >>
>>> >> In general the way how the default configuration is loaded is
>>> >> sub-optional in the moment. Especially using a single defaultdata
>>> >> bundle for both the OpenNLP models and the dbpedia configuration +
>>> >> default index was not a good Idea, because one can not exclude/change
>>> >> the dbpedia stuff without affecting other components that depend on
>>> >> OpenNLP.
>>> >> Therefore I think we need to discuss how to better structure the
>>> >> configurations and data needed to run stanbol.
>>> >>
>>> >> There is also an other issue that the SolrYard only once copies
>>> >> provided indexes and does not check for updates. This would it make
>>> >> hard the upgrade from the small index provided with the default data
>>> >> to a bigger version.
>>> >>
>>> >> Both this things are related to the problems and need to be addressed
>>> >> before the first stanbol release. Independent of those I will try to
>>> >> find a simple solution for what you intend to do.
>>> >>
>>> >> In the meantime I suggest you go for the initially proposed workaround.
>>> >>
>>> >> best
>>> >> Rupert Westenthaler
>>> >>
>>> >> > Thanks for your help,
>>> >> >
>>> >> > David
>>> >> >
>>> >> >
>>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>>> >> > rupert.westenthaler@gmail.com> wrote:
>>> >> >
>>> >> >> Hi
>>> >> >>
>>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>>> >> >> <nu...@cs.unibo.it> wrote:
>>> >> >> > I solved in the same way, but loosing the caching capabilities.
>>> >> >> > Is there any possibility to keep both all the data and the cache?
>>> >> >> >
>>> >> >> > Andrea
>>> >> >> >
>>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>>> >> >> >
>>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>>> >> >> >>
>>> >> >> >> Thanks,
>>> >> >> >> David
>>> >> >> >>
>>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>>> >> >> >> david.riccitelli@interact.it> wrote:
>>> >> >> >>
>>> >> >> >>> Hi Rupert,
>>> >> >> >>>
>>> >> >> >>> I recently updated the Stanbol install, and I found that the RDF
>>> >> >> returned
>>> >> >> >>> by the EntityHub is missing some props (specifically the dbprop
>>> as
>>> >> far
>>> >> >> as I
>>> >> >> >>> can see).
>>> >> >> >>>
>>> >> >> >>> This is the command that I use for testing:
>>> >> >> >>> curl -H "accept: application/rdf+xml" "
>>> >> >> >>>
>>> >> >>
>>> >>
>>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>> >> >> >>> "
>>> >> >> >>>
>>> >> >> >>> which outputs the attached RDF file.
>>> >> >> >>>
>>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
>>> >> with
>>> >> >> the
>>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>>> >> >> >>>
>>> >> >> >>> Does this depend on the mapping.txt file?
>>> >> >> >>>
>>> >> >>
>>> >> >> If you plan to create your own dbpedia index, than the mapping.txt
>>> >> >> file would be the way how to configure what properties are
>>> >> >> includes/excluded.
>>> >> >> Typically dbprop values are low quality. They are just naive 1:1
>>> >> >> mappings of key value pairs as found in the info boxes. Because of
>>> >> >> this they are excluded from the indexes.
>>> >> >>
>>> >> >> At runtime the returned data depend on the used Cache strategy:
>>> >> >>
>>> >> >> Currently there are three possibilities (configured with the
>>> referenced
>>> >> >> Site)
>>> >> >> 1) no cache: bot queries and retrieval so use a remote service
>>> >> >> 2) used: Queries are executed by the remote service. Retrieved
>>> >> >> Entities are stored locally. The cached data depend on the mappings
>>> >> >> defined for the cache.
>>> >> >> 3) all: Both queries and retrieval are based on the cache. The remote
>>> >> >> service are only used as fallback in the case that the cache is not
>>> >> >> available (e.g. if you deactivate solrYard).
>>> >> >>
>>> >> >> So if you you are fine with (2) than you could use the configuration
>>> >> >> as previously used by the stable launcher [1].
>>> >> >> I think the easiest way to install this is to use this is to add the
>>> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to
>>> >> >> delete the current referencedSite for dbpedia first and than add the
>>> >> >> three configuration files as described by [1].
>>> >> >>
>>> >> >> If your requirements are not covered by the currently available
>>> option
>>> >> >> it would be nice if you could write a short user story, because I am
>>> >> >> thinking about how to improve this feature and input like that would
>>> >> >> be really valuable.
>>> >> >>
>>> >> >> best
>>> >> >> Rupert Westenthaler
>>> >> >>
>>> >> >> [1] The dbpedia config consists of three files. the referenced site,
>>> >> >> cache and solryard components with the "-dbpedia" endings.
>>> >> >>
>>> >> >>
>>> >>
>>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>>> >> >>
>>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>>> >> >>
>>> >> >> p.s. I keep this part because it describes very well how the cache
>>> >> >> strategy "used" work:
>>> >> >> >>>>> Hi David
>>> >> >> >>>>>
>>> >> >> >>>>> Assuming that you are using the default distribution of Apache
>>> >> >> Stanbol.
>>> >> >> >>>>>
>>> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will
>>> be
>>> >> >> >>>>> - only the first time answered by retrieving the Entity form
>>> >> >> DBpedia.org
>>> >> >> >>>>> - the Information are cached in a local cache. By that values
>>> of
>>> >> the
>>> >> >> >>>>> documents are filtered (see (a) for details)
>>> >> >> >>>>> - the cached version is returned
>>> >> >> >>>>>
>>> >> >> >>>>> (a) The default configuration for dbpedia stores all fields
>>> >> however
>>> >> >> >>>>> filters values for literals so that only values with the
>>> language
>>> >> >> "en,
>>> >> >> >>>>> de, fr, it, es" or no language are stored.
>>> >> >> >>>>>
>>> >> >> >>>>>
>>> >> >> >>>>> Assuming that you have started for zero when updating to a new
>>> >> >> version
>>> >> >> >>>>> this also means that you have downloaded a new version of this
>>> >> Entity
>>> >> >> >>>>> from dbPedia.
>>> >> >> >>>>>
>>> >> >>
>>> >> >> --
>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> >> | A-5500 Bischofshofen
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > David Riccitelli
>>> >> >
>>> >> > Interact SpA
>>> >> > Via A. Bargoni 78 (scala F)
>>> >> > 00153 Roma
>>> >> >
>>> >> > T +39 06 58318 301
>>> >> > F +39 06 58318 303
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> | A-5500 Bischofshofen
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > David Riccitelli
>>> >
>>> > Interact SpA
>>> > Via A. Bargoni 78 (scala F)
>>> > 00153 Roma
>>> >
>>> > T +39 06 58318 301
>>> > F +39 06 58318 303
>>> >
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>>
>> --
>> David Riccitelli
>>
>> Interact SpA
>> Via A. Bargoni 78 (scala F)
>> 00153 Roma
>>
>> T +39 06 58318 301
>> F +39 06 58318 303
>>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
<da...@interact.it> wrote:
> Thanks Rupert,
>
> A description on how to do this is available in [1].
>
>
> I can't see the [1] :-)

does this count as missing attachment? ^^

[1] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/

>
> David
>
> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi
>>
>> Yes this is possible, but would need (depending on the hardware) quite
>> some time.
>> A description on how to do this is available in [1].
>>
>> Instead of installing the dbpedia.solrindex.zip file as described in
>> the readme, you could directly
>>
>> * shutdown stanbol
>> * delete the "dbpedia_43k" index in
>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
>> * copy the index located in the
>> "{indexing-root}/indexing/destination/indexes" to
>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
>> "dbpedia_43k"
>> * restart stanbol.
>>
>> After that Stanbol should use the new index.
>>
>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
>> than changing the value of "Solr Index/Core" in the configuration of
>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
>> work.
>>
>> best
>> Rupert
>>
>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
>> <da...@interact.it> wrote:
>> > Hi,
>> >
>> > As another workaround, I was thinking that I could actually generate
>> locally
>> > the DBpedia index with all the data using the dumps (
>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
>> dbpedia_43k.
>> >
>> > What do you think?
>> >
>> > Thanks,
>> > David
>> >
>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
>> > rupert.westenthaler@gmail.com> wrote:
>> >
>> >> Hi
>> >>
>> >> I will try to find some time in the evening to reproduce this.
>> >>
>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>> >> <da...@interact.it> wrote:
>> >> > Thanks Rupert,
>> >> >
>> >> > I'm trying to follow your instructions but I encounter a couple of
>> issues
>> >> > (probably due to inexperience):
>> >> >  [1] when dropping the config files, they enter some loop of
>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
>> >> > bundle), is that normal?
>> >>
>> >> This is very strange and should not be caused by the FileInstaller.
>> >> Maybe there is some loop between the Sling Installer - trying to
>> >> install the default configuration and the FileInstaller that may cause
>> >> this under some circumstances.
>> >>
>> >> >  [2] after I restart Stanbol, and try to query an entity from the
>> >> entityhub
>> >> > I receive the following error:
>> >> >
>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with
>> >> Yard
>> >> > dbpediaCache! This is usually caused by Errors while reading the Cache
>> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable
>> to
>> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by
>> >> > Errors while reading the Cache Configuration from the Yard.
>> >> > at
>> >> >
>> >>
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>> >> >
>> >> >
>> >> > Do I need to initialize the Cache in some way?
>> >> >
>> >> No it does not. Prepared in Indexes do include a document that
>> >> provides a list of the indexed fields. In future this may be used to
>> >> determine if a query can be successfully executed on the local index
>> >> or not. In addition this is used in case an Entity within the index is
>> >> updated with an newer version.
>> >> However this configuration is optional and is not required. This
>> >> Exception should only appear if the document is present but illegal
>> >> formatted. However the SolrYard initialized for the dbpediaCache
>> >> should be empty.
>> >>
>> >> Therefore I think it is somehow related to the above problem of
>> >> overriding configurations.
>> >>
>> >> In general the way how the default configuration is loaded is
>> >> sub-optional in the moment. Especially using a single defaultdata
>> >> bundle for both the OpenNLP models and the dbpedia configuration +
>> >> default index was not a good Idea, because one can not exclude/change
>> >> the dbpedia stuff without affecting other components that depend on
>> >> OpenNLP.
>> >> Therefore I think we need to discuss how to better structure the
>> >> configurations and data needed to run stanbol.
>> >>
>> >> There is also an other issue that the SolrYard only once copies
>> >> provided indexes and does not check for updates. This would it make
>> >> hard the upgrade from the small index provided with the default data
>> >> to a bigger version.
>> >>
>> >> Both this things are related to the problems and need to be addressed
>> >> before the first stanbol release. Independent of those I will try to
>> >> find a simple solution for what you intend to do.
>> >>
>> >> In the meantime I suggest you go for the initially proposed workaround.
>> >>
>> >> best
>> >> Rupert Westenthaler
>> >>
>> >> > Thanks for your help,
>> >> >
>> >> > David
>> >> >
>> >> >
>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>> >> > rupert.westenthaler@gmail.com> wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>> >> >> <nu...@cs.unibo.it> wrote:
>> >> >> > I solved in the same way, but loosing the caching capabilities.
>> >> >> > Is there any possibility to keep both all the data and the cache?
>> >> >> >
>> >> >> > Andrea
>> >> >> >
>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>> >> >> >
>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> David
>> >> >> >>
>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>> >> >> >> david.riccitelli@interact.it> wrote:
>> >> >> >>
>> >> >> >>> Hi Rupert,
>> >> >> >>>
>> >> >> >>> I recently updated the Stanbol install, and I found that the RDF
>> >> >> returned
>> >> >> >>> by the EntityHub is missing some props (specifically the dbprop
>> as
>> >> far
>> >> >> as I
>> >> >> >>> can see).
>> >> >> >>>
>> >> >> >>> This is the command that I use for testing:
>> >> >> >>> curl -H "accept: application/rdf+xml" "
>> >> >> >>>
>> >> >>
>> >>
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>> >> >> >>> "
>> >> >> >>>
>> >> >> >>> which outputs the attached RDF file.
>> >> >> >>>
>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
>> >> with
>> >> >> the
>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>> >> >> >>>
>> >> >> >>> Does this depend on the mapping.txt file?
>> >> >> >>>
>> >> >>
>> >> >> If you plan to create your own dbpedia index, than the mapping.txt
>> >> >> file would be the way how to configure what properties are
>> >> >> includes/excluded.
>> >> >> Typically dbprop values are low quality. They are just naive 1:1
>> >> >> mappings of key value pairs as found in the info boxes. Because of
>> >> >> this they are excluded from the indexes.
>> >> >>
>> >> >> At runtime the returned data depend on the used Cache strategy:
>> >> >>
>> >> >> Currently there are three possibilities (configured with the
>> referenced
>> >> >> Site)
>> >> >> 1) no cache: bot queries and retrieval so use a remote service
>> >> >> 2) used: Queries are executed by the remote service. Retrieved
>> >> >> Entities are stored locally. The cached data depend on the mappings
>> >> >> defined for the cache.
>> >> >> 3) all: Both queries and retrieval are based on the cache. The remote
>> >> >> service are only used as fallback in the case that the cache is not
>> >> >> available (e.g. if you deactivate solrYard).
>> >> >>
>> >> >> So if you you are fine with (2) than you could use the configuration
>> >> >> as previously used by the stable launcher [1].
>> >> >> I think the easiest way to install this is to use this is to add the
>> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to
>> >> >> delete the current referencedSite for dbpedia first and than add the
>> >> >> three configuration files as described by [1].
>> >> >>
>> >> >> If your requirements are not covered by the currently available
>> option
>> >> >> it would be nice if you could write a short user story, because I am
>> >> >> thinking about how to improve this feature and input like that would
>> >> >> be really valuable.
>> >> >>
>> >> >> best
>> >> >> Rupert Westenthaler
>> >> >>
>> >> >> [1] The dbpedia config consists of three files. the referenced site,
>> >> >> cache and solryard components with the "-dbpedia" endings.
>> >> >>
>> >> >>
>> >>
>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>> >> >>
>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>> >> >>
>> >> >> p.s. I keep this part because it describes very well how the cache
>> >> >> strategy "used" work:
>> >> >> >>>>> Hi David
>> >> >> >>>>>
>> >> >> >>>>> Assuming that you are using the default distribution of Apache
>> >> >> Stanbol.
>> >> >> >>>>>
>> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will
>> be
>> >> >> >>>>> - only the first time answered by retrieving the Entity form
>> >> >> DBpedia.org
>> >> >> >>>>> - the Information are cached in a local cache. By that values
>> of
>> >> the
>> >> >> >>>>> documents are filtered (see (a) for details)
>> >> >> >>>>> - the cached version is returned
>> >> >> >>>>>
>> >> >> >>>>> (a) The default configuration for dbpedia stores all fields
>> >> however
>> >> >> >>>>> filters values for literals so that only values with the
>> language
>> >> >> "en,
>> >> >> >>>>> de, fr, it, es" or no language are stored.
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> Assuming that you have started for zero when updating to a new
>> >> >> version
>> >> >> >>>>> this also means that you have downloaded a new version of this
>> >> Entity
>> >> >> >>>>> from dbPedia.
>> >> >> >>>>>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > David Riccitelli
>> >> >
>> >> > Interact SpA
>> >> > Via A. Bargoni 78 (scala F)
>> >> > 00153 Roma
>> >> >
>> >> > T +39 06 58318 301
>> >> > F +39 06 58318 303
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>> >
>> >
>> >
>> > --
>> > David Riccitelli
>> >
>> > Interact SpA
>> > Via A. Bargoni 78 (scala F)
>> > 00153 Roma
>> >
>> > T +39 06 58318 301
>> > F +39 06 58318 303
>> >
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Thanks Rupert,

A description on how to do this is available in [1].


I can't see the [1] :-)

David

On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi
>
> Yes this is possible, but would need (depending on the hardware) quite
> some time.
> A description on how to do this is available in [1].
>
> Instead of installing the dbpedia.solrindex.zip file as described in
> the readme, you could directly
>
> * shutdown stanbol
> * delete the "dbpedia_43k" index in
> "{stanbol-root}/sling/entityhub/solrYard/indexes"
> * copy the index located in the
> "{indexing-root}/indexing/destination/indexes" to
> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
> "dbpedia_43k"
> * restart stanbol.
>
> After that Stanbol should use the new index.
>
> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
> than changing the value of "Solr Index/Core" in the configuration of
> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
> work.
>
> best
> Rupert
>
> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
> <da...@interact.it> wrote:
> > Hi,
> >
> > As another workaround, I was thinking that I could actually generate
> locally
> > the DBpedia index with all the data using the dumps (
> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
> dbpedia_43k.
> >
> > What do you think?
> >
> > Thanks,
> > David
> >
> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi
> >>
> >> I will try to find some time in the evening to reproduce this.
> >>
> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
> >> <da...@interact.it> wrote:
> >> > Thanks Rupert,
> >> >
> >> > I'm trying to follow your instructions but I encounter a couple of
> issues
> >> > (probably due to inexperience):
> >> >  [1] when dropping the config files, they enter some loop of
> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
> >> > bundle), is that normal?
> >>
> >> This is very strange and should not be caused by the FileInstaller.
> >> Maybe there is some loop between the Sling Installer - trying to
> >> install the default configuration and the FileInstaller that may cause
> >> this under some circumstances.
> >>
> >> >  [2] after I restart Stanbol, and try to query an entity from the
> >> entityhub
> >> > I receive the following error:
> >> >
> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with
> >> Yard
> >> > dbpediaCache! This is usually caused by Errors while reading the Cache
> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable
> to
> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by
> >> > Errors while reading the Cache Configuration from the Yard.
> >> > at
> >> >
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >> >
> >> >
> >> > Do I need to initialize the Cache in some way?
> >> >
> >> No it does not. Prepared in Indexes do include a document that
> >> provides a list of the indexed fields. In future this may be used to
> >> determine if a query can be successfully executed on the local index
> >> or not. In addition this is used in case an Entity within the index is
> >> updated with an newer version.
> >> However this configuration is optional and is not required. This
> >> Exception should only appear if the document is present but illegal
> >> formatted. However the SolrYard initialized for the dbpediaCache
> >> should be empty.
> >>
> >> Therefore I think it is somehow related to the above problem of
> >> overriding configurations.
> >>
> >> In general the way how the default configuration is loaded is
> >> sub-optional in the moment. Especially using a single defaultdata
> >> bundle for both the OpenNLP models and the dbpedia configuration +
> >> default index was not a good Idea, because one can not exclude/change
> >> the dbpedia stuff without affecting other components that depend on
> >> OpenNLP.
> >> Therefore I think we need to discuss how to better structure the
> >> configurations and data needed to run stanbol.
> >>
> >> There is also an other issue that the SolrYard only once copies
> >> provided indexes and does not check for updates. This would it make
> >> hard the upgrade from the small index provided with the default data
> >> to a bigger version.
> >>
> >> Both this things are related to the problems and need to be addressed
> >> before the first stanbol release. Independent of those I will try to
> >> find a simple solution for what you intend to do.
> >>
> >> In the meantime I suggest you go for the initially proposed workaround.
> >>
> >> best
> >> Rupert Westenthaler
> >>
> >> > Thanks for your help,
> >> >
> >> > David
> >> >
> >> >
> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> >> > rupert.westenthaler@gmail.com> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> >> >> <nu...@cs.unibo.it> wrote:
> >> >> > I solved in the same way, but loosing the caching capabilities.
> >> >> > Is there any possibility to keep both all the data and the cache?
> >> >> >
> >> >> > Andrea
> >> >> >
> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >> >> >
> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> David
> >> >> >>
> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >> >> >> david.riccitelli@interact.it> wrote:
> >> >> >>
> >> >> >>> Hi Rupert,
> >> >> >>>
> >> >> >>> I recently updated the Stanbol install, and I found that the RDF
> >> >> returned
> >> >> >>> by the EntityHub is missing some props (specifically the dbprop
> as
> >> far
> >> >> as I
> >> >> >>> can see).
> >> >> >>>
> >> >> >>> This is the command that I use for testing:
> >> >> >>> curl -H "accept: application/rdf+xml" "
> >> >> >>>
> >> >>
> >>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> >> >> >>> "
> >> >> >>>
> >> >> >>> which outputs the attached RDF file.
> >> >> >>>
> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
> >> with
> >> >> the
> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >> >> >>>
> >> >> >>> Does this depend on the mapping.txt file?
> >> >> >>>
> >> >>
> >> >> If you plan to create your own dbpedia index, than the mapping.txt
> >> >> file would be the way how to configure what properties are
> >> >> includes/excluded.
> >> >> Typically dbprop values are low quality. They are just naive 1:1
> >> >> mappings of key value pairs as found in the info boxes. Because of
> >> >> this they are excluded from the indexes.
> >> >>
> >> >> At runtime the returned data depend on the used Cache strategy:
> >> >>
> >> >> Currently there are three possibilities (configured with the
> referenced
> >> >> Site)
> >> >> 1) no cache: bot queries and retrieval so use a remote service
> >> >> 2) used: Queries are executed by the remote service. Retrieved
> >> >> Entities are stored locally. The cached data depend on the mappings
> >> >> defined for the cache.
> >> >> 3) all: Both queries and retrieval are based on the cache. The remote
> >> >> service are only used as fallback in the case that the cache is not
> >> >> available (e.g. if you deactivate solrYard).
> >> >>
> >> >> So if you you are fine with (2) than you could use the configuration
> >> >> as previously used by the stable launcher [1].
> >> >> I think the easiest way to install this is to use this is to add the
> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to
> >> >> delete the current referencedSite for dbpedia first and than add the
> >> >> three configuration files as described by [1].
> >> >>
> >> >> If your requirements are not covered by the currently available
> option
> >> >> it would be nice if you could write a short user story, because I am
> >> >> thinking about how to improve this feature and input like that would
> >> >> be really valuable.
> >> >>
> >> >> best
> >> >> Rupert Westenthaler
> >> >>
> >> >> [1] The dbpedia config consists of three files. the referenced site,
> >> >> cache and solryard components with the "-dbpedia" endings.
> >> >>
> >> >>
> >>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
> >> >>
> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
> >> >>
> >> >> p.s. I keep this part because it describes very well how the cache
> >> >> strategy "used" work:
> >> >> >>>>> Hi David
> >> >> >>>>>
> >> >> >>>>> Assuming that you are using the default distribution of Apache
> >> >> Stanbol.
> >> >> >>>>>
> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will
> be
> >> >> >>>>> - only the first time answered by retrieving the Entity form
> >> >> DBpedia.org
> >> >> >>>>> - the Information are cached in a local cache. By that values
> of
> >> the
> >> >> >>>>> documents are filtered (see (a) for details)
> >> >> >>>>> - the cached version is returned
> >> >> >>>>>
> >> >> >>>>> (a) The default configuration for dbpedia stores all fields
> >> however
> >> >> >>>>> filters values for literals so that only values with the
> language
> >> >> "en,
> >> >> >>>>> de, fr, it, es" or no language are stored.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> Assuming that you have started for zero when updating to a new
> >> >> version
> >> >> >>>>> this also means that you have downloaded a new version of this
> >> Entity
> >> >> >>>>> from dbPedia.
> >> >> >>>>>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > David Riccitelli
> >> >
> >> > Interact SpA
> >> > Via A. Bargoni 78 (scala F)
> >> > 00153 Roma
> >> >
> >> > T +39 06 58318 301
> >> > F +39 06 58318 303
> >> >
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > David Riccitelli
> >
> > Interact SpA
> > Via A. Bargoni 78 (scala F)
> > 00153 Roma
> >
> > T +39 06 58318 301
> > F +39 06 58318 303
> >
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

Yes this is possible, but would need (depending on the hardware) quite
some time.
A description on how to do this is available in [1].

Instead of installing the dbpedia.solrindex.zip file as described in
the readme, you could directly

* shutdown stanbol
* delete the "dbpedia_43k" index in
"{stanbol-root}/sling/entityhub/solrYard/indexes"
* copy the index located in the
"{indexing-root}/indexing/destination/indexes" to
"{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
"dbpedia_43k"
* restart stanbol.

After that Stanbol should use the new index.

Copying the "dbpedia.solrindex.zip" to the datafiles directory and
than changing the value of "Solr Index/Core" in the configuration of
the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
work.

best
Rupert

On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
<da...@interact.it> wrote:
> Hi,
>
> As another workaround, I was thinking that I could actually generate locally
> the DBpedia index with all the data using the dumps (
> http://wiki.dbpedia.org/Downloads36), in a way similar to the dbpedia_43k.
>
> What do you think?
>
> Thanks,
> David
>
> On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi
>>
>> I will try to find some time in the evening to reproduce this.
>>
>> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>> <da...@interact.it> wrote:
>> > Thanks Rupert,
>> >
>> > I'm trying to follow your instructions but I encounter a couple of issues
>> > (probably due to inexperience):
>> >  [1] when dropping the config files, they enter some loop of
>> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
>> > bundle), is that normal?
>>
>> This is very strange and should not be caused by the FileInstaller.
>> Maybe there is some loop between the Sling Installer - trying to
>> install the default configuration and the FileInstaller that may cause
>> this under some circumstances.
>>
>> >  [2] after I restart Stanbol, and try to query an entity from the
>> entityhub
>> > I receive the following error:
>> >
>> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>> > org.apache.felix.http.jetty /entityhub/sites/entity/
>> > (java.lang.IllegalStateException: Unable to initialize the Cache with
>> Yard
>> > dbpediaCache! This is usually caused by Errors while reading the Cache
>> > Configuration from the Yard.) java.lang.IllegalStateException: Unable to
>> > initialize the Cache with Yard dbpediaCache! This is usually caused by
>> > Errors while reading the Cache Configuration from the Yard.
>> > at
>> >
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>> >
>> >
>> > Do I need to initialize the Cache in some way?
>> >
>> No it does not. Prepared in Indexes do include a document that
>> provides a list of the indexed fields. In future this may be used to
>> determine if a query can be successfully executed on the local index
>> or not. In addition this is used in case an Entity within the index is
>> updated with an newer version.
>> However this configuration is optional and is not required. This
>> Exception should only appear if the document is present but illegal
>> formatted. However the SolrYard initialized for the dbpediaCache
>> should be empty.
>>
>> Therefore I think it is somehow related to the above problem of
>> overriding configurations.
>>
>> In general the way how the default configuration is loaded is
>> sub-optional in the moment. Especially using a single defaultdata
>> bundle for both the OpenNLP models and the dbpedia configuration +
>> default index was not a good Idea, because one can not exclude/change
>> the dbpedia stuff without affecting other components that depend on
>> OpenNLP.
>> Therefore I think we need to discuss how to better structure the
>> configurations and data needed to run stanbol.
>>
>> There is also an other issue that the SolrYard only once copies
>> provided indexes and does not check for updates. This would it make
>> hard the upgrade from the small index provided with the default data
>> to a bigger version.
>>
>> Both this things are related to the problems and need to be addressed
>> before the first stanbol release. Independent of those I will try to
>> find a simple solution for what you intend to do.
>>
>> In the meantime I suggest you go for the initially proposed workaround.
>>
>> best
>> Rupert Westenthaler
>>
>> > Thanks for your help,
>> >
>> > David
>> >
>> >
>> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>> > rupert.westenthaler@gmail.com> wrote:
>> >
>> >> Hi
>> >>
>> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>> >> <nu...@cs.unibo.it> wrote:
>> >> > I solved in the same way, but loosing the caching capabilities.
>> >> > Is there any possibility to keep both all the data and the cache?
>> >> >
>> >> > Andrea
>> >> >
>> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>> >> >
>> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>> >> >>
>> >> >> Thanks,
>> >> >> David
>> >> >>
>> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>> >> >> david.riccitelli@interact.it> wrote:
>> >> >>
>> >> >>> Hi Rupert,
>> >> >>>
>> >> >>> I recently updated the Stanbol install, and I found that the RDF
>> >> returned
>> >> >>> by the EntityHub is missing some props (specifically the dbprop as
>> far
>> >> as I
>> >> >>> can see).
>> >> >>>
>> >> >>> This is the command that I use for testing:
>> >> >>> curl -H "accept: application/rdf+xml" "
>> >> >>>
>> >>
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>> >> >>> "
>> >> >>>
>> >> >>> which outputs the attached RDF file.
>> >> >>>
>> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
>> with
>> >> the
>> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>> >> >>>
>> >> >>> Does this depend on the mapping.txt file?
>> >> >>>
>> >>
>> >> If you plan to create your own dbpedia index, than the mapping.txt
>> >> file would be the way how to configure what properties are
>> >> includes/excluded.
>> >> Typically dbprop values are low quality. They are just naive 1:1
>> >> mappings of key value pairs as found in the info boxes. Because of
>> >> this they are excluded from the indexes.
>> >>
>> >> At runtime the returned data depend on the used Cache strategy:
>> >>
>> >> Currently there are three possibilities (configured with the referenced
>> >> Site)
>> >> 1) no cache: bot queries and retrieval so use a remote service
>> >> 2) used: Queries are executed by the remote service. Retrieved
>> >> Entities are stored locally. The cached data depend on the mappings
>> >> defined for the cache.
>> >> 3) all: Both queries and retrieval are based on the cache. The remote
>> >> service are only used as fallback in the case that the cache is not
>> >> available (e.g. if you deactivate solrYard).
>> >>
>> >> So if you you are fine with (2) than you could use the configuration
>> >> as previously used by the stable launcher [1].
>> >> I think the easiest way to install this is to use this is to add the
>> >> Felix File Installer [2] to the Stanbol Environment. You will need to
>> >> delete the current referencedSite for dbpedia first and than add the
>> >> three configuration files as described by [1].
>> >>
>> >> If your requirements are not covered by the currently available option
>> >> it would be nice if you could write a short user story, because I am
>> >> thinking about how to improve this feature and input like that would
>> >> be really valuable.
>> >>
>> >> best
>> >> Rupert Westenthaler
>> >>
>> >> [1] The dbpedia config consists of three files. the referenced site,
>> >> cache and solryard components with the "-dbpedia" endings.
>> >>
>> >>
>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>> >>
>> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>> >>
>> >> p.s. I keep this part because it describes very well how the cache
>> >> strategy "used" work:
>> >> >>>>> Hi David
>> >> >>>>>
>> >> >>>>> Assuming that you are using the default distribution of Apache
>> >> Stanbol.
>> >> >>>>>
>> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>> >> >>>>> - only the first time answered by retrieving the Entity form
>> >> DBpedia.org
>> >> >>>>> - the Information are cached in a local cache. By that values of
>> the
>> >> >>>>> documents are filtered (see (a) for details)
>> >> >>>>> - the cached version is returned
>> >> >>>>>
>> >> >>>>> (a) The default configuration for dbpedia stores all fields
>> however
>> >> >>>>> filters values for literals so that only values with the language
>> >> "en,
>> >> >>>>> de, fr, it, es" or no language are stored.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Assuming that you have started for zero when updating to a new
>> >> version
>> >> >>>>> this also means that you have downloaded a new version of this
>> Entity
>> >> >>>>> from dbPedia.
>> >> >>>>>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>> >
>> >
>> >
>> > --
>> > David Riccitelli
>> >
>> > Interact SpA
>> > Via A. Bargoni 78 (scala F)
>> > 00153 Roma
>> >
>> > T +39 06 58318 301
>> > F +39 06 58318 303
>> >
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Hi,

As another workaround, I was thinking that I could actually generate locally
the DBpedia index with all the data using the dumps (
http://wiki.dbpedia.org/Downloads36), in a way similar to the dbpedia_43k.

What do you think?

Thanks,
David

On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi
>
> I will try to find some time in the evening to reproduce this.
>
> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
> <da...@interact.it> wrote:
> > Thanks Rupert,
> >
> > I'm trying to follow your instructions but I encounter a couple of issues
> > (probably due to inexperience):
> >  [1] when dropping the config files, they enter some loop of
> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
> > bundle), is that normal?
>
> This is very strange and should not be caused by the FileInstaller.
> Maybe there is some loop between the Sling Installer - trying to
> install the default configuration and the FileInstaller that may cause
> this under some circumstances.
>
> >  [2] after I restart Stanbol, and try to query an entity from the
> entityhub
> > I receive the following error:
> >
> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> > org.apache.felix.http.jetty /entityhub/sites/entity/
> > (java.lang.IllegalStateException: Unable to initialize the Cache with
> Yard
> > dbpediaCache! This is usually caused by Errors while reading the Cache
> > Configuration from the Yard.) java.lang.IllegalStateException: Unable to
> > initialize the Cache with Yard dbpediaCache! This is usually caused by
> > Errors while reading the Cache Configuration from the Yard.
> > at
> >
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >
> >
> > Do I need to initialize the Cache in some way?
> >
> No it does not. Prepared in Indexes do include a document that
> provides a list of the indexed fields. In future this may be used to
> determine if a query can be successfully executed on the local index
> or not. In addition this is used in case an Entity within the index is
> updated with an newer version.
> However this configuration is optional and is not required. This
> Exception should only appear if the document is present but illegal
> formatted. However the SolrYard initialized for the dbpediaCache
> should be empty.
>
> Therefore I think it is somehow related to the above problem of
> overriding configurations.
>
> In general the way how the default configuration is loaded is
> sub-optional in the moment. Especially using a single defaultdata
> bundle for both the OpenNLP models and the dbpedia configuration +
> default index was not a good Idea, because one can not exclude/change
> the dbpedia stuff without affecting other components that depend on
> OpenNLP.
> Therefore I think we need to discuss how to better structure the
> configurations and data needed to run stanbol.
>
> There is also an other issue that the SolrYard only once copies
> provided indexes and does not check for updates. This would it make
> hard the upgrade from the small index provided with the default data
> to a bigger version.
>
> Both this things are related to the problems and need to be addressed
> before the first stanbol release. Independent of those I will try to
> find a simple solution for what you intend to do.
>
> In the meantime I suggest you go for the initially proposed workaround.
>
> best
> Rupert Westenthaler
>
> > Thanks for your help,
> >
> > David
> >
> >
> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi
> >>
> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> >> <nu...@cs.unibo.it> wrote:
> >> > I solved in the same way, but loosing the caching capabilities.
> >> > Is there any possibility to keep both all the data and the cache?
> >> >
> >> > Andrea
> >> >
> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >> >
> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >> >> david.riccitelli@interact.it> wrote:
> >> >>
> >> >>> Hi Rupert,
> >> >>>
> >> >>> I recently updated the Stanbol install, and I found that the RDF
> >> returned
> >> >>> by the EntityHub is missing some props (specifically the dbprop as
> far
> >> as I
> >> >>> can see).
> >> >>>
> >> >>> This is the command that I use for testing:
> >> >>> curl -H "accept: application/rdf+xml" "
> >> >>>
> >>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> >> >>> "
> >> >>>
> >> >>> which outputs the attached RDF file.
> >> >>>
> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
> with
> >> the
> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >> >>>
> >> >>> Does this depend on the mapping.txt file?
> >> >>>
> >>
> >> If you plan to create your own dbpedia index, than the mapping.txt
> >> file would be the way how to configure what properties are
> >> includes/excluded.
> >> Typically dbprop values are low quality. They are just naive 1:1
> >> mappings of key value pairs as found in the info boxes. Because of
> >> this they are excluded from the indexes.
> >>
> >> At runtime the returned data depend on the used Cache strategy:
> >>
> >> Currently there are three possibilities (configured with the referenced
> >> Site)
> >> 1) no cache: bot queries and retrieval so use a remote service
> >> 2) used: Queries are executed by the remote service. Retrieved
> >> Entities are stored locally. The cached data depend on the mappings
> >> defined for the cache.
> >> 3) all: Both queries and retrieval are based on the cache. The remote
> >> service are only used as fallback in the case that the cache is not
> >> available (e.g. if you deactivate solrYard).
> >>
> >> So if you you are fine with (2) than you could use the configuration
> >> as previously used by the stable launcher [1].
> >> I think the easiest way to install this is to use this is to add the
> >> Felix File Installer [2] to the Stanbol Environment. You will need to
> >> delete the current referencedSite for dbpedia first and than add the
> >> three configuration files as described by [1].
> >>
> >> If your requirements are not covered by the currently available option
> >> it would be nice if you could write a short user story, because I am
> >> thinking about how to improve this feature and input like that would
> >> be really valuable.
> >>
> >> best
> >> Rupert Westenthaler
> >>
> >> [1] The dbpedia config consists of three files. the referenced site,
> >> cache and solryard components with the "-dbpedia" endings.
> >>
> >>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
> >>
> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
> >>
> >> p.s. I keep this part because it describes very well how the cache
> >> strategy "used" work:
> >> >>>>> Hi David
> >> >>>>>
> >> >>>>> Assuming that you are using the default distribution of Apache
> >> Stanbol.
> >> >>>>>
> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
> >> >>>>> - only the first time answered by retrieving the Entity form
> >> DBpedia.org
> >> >>>>> - the Information are cached in a local cache. By that values of
> the
> >> >>>>> documents are filtered (see (a) for details)
> >> >>>>> - the cached version is returned
> >> >>>>>
> >> >>>>> (a) The default configuration for dbpedia stores all fields
> however
> >> >>>>> filters values for literals so that only values with the language
> >> "en,
> >> >>>>> de, fr, it, es" or no language are stored.
> >> >>>>>
> >> >>>>>
> >> >>>>> Assuming that you have started for zero when updating to a new
> >> version
> >> >>>>> this also means that you have downloaded a new version of this
> Entity
> >> >>>>> from dbPedia.
> >> >>>>>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > David Riccitelli
> >
> > Interact SpA
> > Via A. Bargoni 78 (scala F)
> > 00153 Roma
> >
> > T +39 06 58318 301
> > F +39 06 58318 303
> >
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

I will try to find some time in the evening to reproduce this.

On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
<da...@interact.it> wrote:
> Thanks Rupert,
>
> I'm trying to follow your instructions but I encounter a couple of issues
> (probably due to inexperience):
>  [1] when dropping the config files, they enter some loop of
> REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
> bundle), is that normal?

This is very strange and should not be caused by the FileInstaller.
Maybe there is some loop between the Sling Installer - trying to
install the default configuration and the FileInstaller that may cause
this under some circumstances.

>  [2] after I restart Stanbol, and try to query an entity from the entityhub
> I receive the following error:
>
> 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> org.apache.felix.http.jetty /entityhub/sites/entity/
> (java.lang.IllegalStateException: Unable to initialize the Cache with Yard
> dbpediaCache! This is usually caused by Errors while reading the Cache
> Configuration from the Yard.) java.lang.IllegalStateException: Unable to
> initialize the Cache with Yard dbpediaCache! This is usually caused by
> Errors while reading the Cache Configuration from the Yard.
> at
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>
>
> Do I need to initialize the Cache in some way?
>
No it does not. Prepared in Indexes do include a document that
provides a list of the indexed fields. In future this may be used to
determine if a query can be successfully executed on the local index
or not. In addition this is used in case an Entity within the index is
updated with an newer version.
However this configuration is optional and is not required. This
Exception should only appear if the document is present but illegal
formatted. However the SolrYard initialized for the dbpediaCache
should be empty.

Therefore I think it is somehow related to the above problem of
overriding configurations.

In general the way how the default configuration is loaded is
sub-optional in the moment. Especially using a single defaultdata
bundle for both the OpenNLP models and the dbpedia configuration +
default index was not a good Idea, because one can not exclude/change
the dbpedia stuff without affecting other components that depend on
OpenNLP.
Therefore I think we need to discuss how to better structure the
configurations and data needed to run stanbol.

There is also an other issue that the SolrYard only once copies
provided indexes and does not check for updates. This would it make
hard the upgrade from the small index provided with the default data
to a bigger version.

Both this things are related to the problems and need to be addressed
before the first stanbol release. Independent of those I will try to
find a simple solution for what you intend to do.

In the meantime I suggest you go for the initially proposed workaround.

best
Rupert Westenthaler

> Thanks for your help,
>
> David
>
>
> On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi
>>
>> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>> <nu...@cs.unibo.it> wrote:
>> > I solved in the same way, but loosing the caching capabilities.
>> > Is there any possibility to keep both all the data and the cache?
>> >
>> > Andrea
>> >
>> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>> >
>> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>> >> david.riccitelli@interact.it> wrote:
>> >>
>> >>> Hi Rupert,
>> >>>
>> >>> I recently updated the Stanbol install, and I found that the RDF
>> returned
>> >>> by the EntityHub is missing some props (specifically the dbprop as far
>> as I
>> >>> can see).
>> >>>
>> >>> This is the command that I use for testing:
>> >>> curl -H "accept: application/rdf+xml" "
>> >>>
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>> >>> "
>> >>>
>> >>> which outputs the attached RDF file.
>> >>>
>> >>> I cleared all of the sling folder (rm -fr sling) and checked the with
>> the
>> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>> >>>
>> >>> Does this depend on the mapping.txt file?
>> >>>
>>
>> If you plan to create your own dbpedia index, than the mapping.txt
>> file would be the way how to configure what properties are
>> includes/excluded.
>> Typically dbprop values are low quality. They are just naive 1:1
>> mappings of key value pairs as found in the info boxes. Because of
>> this they are excluded from the indexes.
>>
>> At runtime the returned data depend on the used Cache strategy:
>>
>> Currently there are three possibilities (configured with the referenced
>> Site)
>> 1) no cache: bot queries and retrieval so use a remote service
>> 2) used: Queries are executed by the remote service. Retrieved
>> Entities are stored locally. The cached data depend on the mappings
>> defined for the cache.
>> 3) all: Both queries and retrieval are based on the cache. The remote
>> service are only used as fallback in the case that the cache is not
>> available (e.g. if you deactivate solrYard).
>>
>> So if you you are fine with (2) than you could use the configuration
>> as previously used by the stable launcher [1].
>> I think the easiest way to install this is to use this is to add the
>> Felix File Installer [2] to the Stanbol Environment. You will need to
>> delete the current referencedSite for dbpedia first and than add the
>> three configuration files as described by [1].
>>
>> If your requirements are not covered by the currently available option
>> it would be nice if you could write a short user story, because I am
>> thinking about how to improve this feature and input like that would
>> be really valuable.
>>
>> best
>> Rupert Westenthaler
>>
>> [1] The dbpedia config consists of three files. the referenced site,
>> cache and solryard components with the "-dbpedia" endings.
>>
>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>>
>> [2] http://felix.apache.org/site/apache-felix-file-install.html
>>
>> p.s. I keep this part because it describes very well how the cache
>> strategy "used" work:
>> >>>>> Hi David
>> >>>>>
>> >>>>> Assuming that you are using the default distribution of Apache
>> Stanbol.
>> >>>>>
>> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>> >>>>> - only the first time answered by retrieving the Entity form
>> DBpedia.org
>> >>>>> - the Information are cached in a local cache. By that values of the
>> >>>>> documents are filtered (see (a) for details)
>> >>>>> - the cached version is returned
>> >>>>>
>> >>>>> (a) The default configuration for dbpedia stores all fields however
>> >>>>> filters values for literals so that only values with the language
>> "en,
>> >>>>> de, fr, it, es" or no language are stored.
>> >>>>>
>> >>>>>
>> >>>>> Assuming that you have started for zero when updating to a new
>> version
>> >>>>> this also means that you have downloaded a new version of this Entity
>> >>>>> from dbPedia.
>> >>>>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

RE: EntityHub and DBpedia

Posted by Steve Reiner <st...@integratedsemantics.com>.
I am getting something like this too after updating with the code checked in
yesterday. Problem wasn't there in the code the day before.

(using /engines page)

-----Original Message-----
From: David Riccitelli [mailto:david.riccitelli@interact.it] 
Sent: Tuesday, July 12, 2011 11:58 PM
To: stanbol-dev@incubator.apache.org
Subject: Re: EntityHub and DBpedia

Thanks Rupert,

I'm trying to follow your instructions but I encounter a couple of issues
(probably due to inexperience):
 [1] when dropping the config files, they enter some loop of
REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
bundle), is that normal?
 [2] after I restart Stanbol, and try to query an entity from the entityhub
I receive the following error:

13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
org.apache.felix.http.jetty /entityhub/sites/entity/
(java.lang.IllegalStateException: Unable to initialize the Cache with Yard
dbpediaCache! This is usually caused by Errors while reading the Cache
Configuration from the Yard.) java.lang.IllegalStateException: Unable to
initialize the Cache with Yard dbpediaCache! This is usually caused by
Errors while reading the Cache Configuration from the Yard.
at
org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java
:214)


Do I need to initialize the Cache in some way?

Thanks for your help,

David


On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi
>
> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese 
> <nu...@cs.unibo.it> wrote:
> > I solved in the same way, but loosing the caching capabilities.
> > Is there any possibility to keep both all the data and the cache?
> >
> > Andrea
> >
> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >
> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >>
> >> Thanks,
> >> David
> >>
> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli < 
> >> david.riccitelli@interact.it> wrote:
> >>
> >>> Hi Rupert,
> >>>
> >>> I recently updated the Stanbol install, and I found that the RDF
> returned
> >>> by the EntityHub is missing some props (specifically the dbprop as 
> >>> far
> as I
> >>> can see).
> >>>
> >>> This is the command that I use for testing:
> >>> curl -H "accept: application/rdf+xml" "
> >>>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.
> org/resource/Valentino_Rossi
> >>> "
> >>>
> >>> which outputs the attached RDF file.
> >>>
> >>> I cleared all of the sling folder (rm -fr sling) and checked the 
> >>> with
> the
> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >>>
> >>> Does this depend on the mapping.txt file?
> >>>
>
> If you plan to create your own dbpedia index, than the mapping.txt 
> file would be the way how to configure what properties are 
> includes/excluded.
> Typically dbprop values are low quality. They are just naive 1:1 
> mappings of key value pairs as found in the info boxes. Because of 
> this they are excluded from the indexes.
>
> At runtime the returned data depend on the used Cache strategy:
>
> Currently there are three possibilities (configured with the 
> referenced
> Site)
> 1) no cache: bot queries and retrieval so use a remote service
> 2) used: Queries are executed by the remote service. Retrieved 
> Entities are stored locally. The cached data depend on the mappings 
> defined for the cache.
> 3) all: Both queries and retrieval are based on the cache. The remote 
> service are only used as fallback in the case that the cache is not 
> available (e.g. if you deactivate solrYard).
>
> So if you you are fine with (2) than you could use the configuration 
> as previously used by the stable launcher [1].
> I think the easiest way to install this is to use this is to add the 
> Felix File Installer [2] to the Stanbol Environment. You will need to 
> delete the current referencedSite for dbpedia first and than add the 
> three configuration files as described by [1].
>
> If your requirements are not covered by the currently available option 
> it would be nice if you could write a short user story, because I am 
> thinking about how to improve this feature and input like that would 
> be really valuable.
>
> best
> Rupert Westenthaler
>
> [1] The dbpedia config consists of three files. the referenced site, 
> cache and solryard components with the "-dbpedia" endings.
>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/
> src/main/resources/resources/config/?pathrev=1140181
>
> [2] http://felix.apache.org/site/apache-felix-file-install.html
>
> p.s. I keep this part because it describes very well how the cache 
> strategy "used" work:
> >>>>> Hi David
> >>>>>
> >>>>> Assuming that you are using the default distribution of Apache
> Stanbol.
> >>>>>
> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will 
> >>>>> be
> >>>>> - only the first time answered by retrieving the Entity form
> DBpedia.org
> >>>>> - the Information are cached in a local cache. By that values of 
> >>>>> the documents are filtered (see (a) for details)
> >>>>> - the cached version is returned
> >>>>>
> >>>>> (a) The default configuration for dbpedia stores all fields 
> >>>>> however filters values for literals so that only values with the 
> >>>>> language
> "en,
> >>>>> de, fr, it, es" or no language are stored.
> >>>>>
> >>>>>
> >>>>> Assuming that you have started for zero when updating to a new
> version
> >>>>> this also means that you have downloaded a new version of this 
> >>>>> Entity from dbPedia.
> >>>>>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



--
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303


Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Thanks Rupert,

I'm trying to follow your instructions but I encounter a couple of issues
(probably due to inexperience):
 [1] when dropping the config files, they enter some loop of
REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
bundle), is that normal?
 [2] after I restart Stanbol, and try to query an entity from the entityhub
I receive the following error:

13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
org.apache.felix.http.jetty /entityhub/sites/entity/
(java.lang.IllegalStateException: Unable to initialize the Cache with Yard
dbpediaCache! This is usually caused by Errors while reading the Cache
Configuration from the Yard.) java.lang.IllegalStateException: Unable to
initialize the Cache with Yard dbpediaCache! This is usually caused by
Errors while reading the Cache Configuration from the Yard.
at
org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)


Do I need to initialize the Cache in some way?

Thanks for your help,

David


On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi
>
> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> <nu...@cs.unibo.it> wrote:
> > I solved in the same way, but loosing the caching capabilities.
> > Is there any possibility to keep both all the data and the cache?
> >
> > Andrea
> >
> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >
> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >>
> >> Thanks,
> >> David
> >>
> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >> david.riccitelli@interact.it> wrote:
> >>
> >>> Hi Rupert,
> >>>
> >>> I recently updated the Stanbol install, and I found that the RDF
> returned
> >>> by the EntityHub is missing some props (specifically the dbprop as far
> as I
> >>> can see).
> >>>
> >>> This is the command that I use for testing:
> >>> curl -H "accept: application/rdf+xml" "
> >>>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> >>> "
> >>>
> >>> which outputs the attached RDF file.
> >>>
> >>> I cleared all of the sling folder (rm -fr sling) and checked the with
> the
> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >>>
> >>> Does this depend on the mapping.txt file?
> >>>
>
> If you plan to create your own dbpedia index, than the mapping.txt
> file would be the way how to configure what properties are
> includes/excluded.
> Typically dbprop values are low quality. They are just naive 1:1
> mappings of key value pairs as found in the info boxes. Because of
> this they are excluded from the indexes.
>
> At runtime the returned data depend on the used Cache strategy:
>
> Currently there are three possibilities (configured with the referenced
> Site)
> 1) no cache: bot queries and retrieval so use a remote service
> 2) used: Queries are executed by the remote service. Retrieved
> Entities are stored locally. The cached data depend on the mappings
> defined for the cache.
> 3) all: Both queries and retrieval are based on the cache. The remote
> service are only used as fallback in the case that the cache is not
> available (e.g. if you deactivate solrYard).
>
> So if you you are fine with (2) than you could use the configuration
> as previously used by the stable launcher [1].
> I think the easiest way to install this is to use this is to add the
> Felix File Installer [2] to the Stanbol Environment. You will need to
> delete the current referencedSite for dbpedia first and than add the
> three configuration files as described by [1].
>
> If your requirements are not covered by the currently available option
> it would be nice if you could write a short user story, because I am
> thinking about how to improve this feature and input like that would
> be really valuable.
>
> best
> Rupert Westenthaler
>
> [1] The dbpedia config consists of three files. the referenced site,
> cache and solryard components with the "-dbpedia" endings.
>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>
> [2] http://felix.apache.org/site/apache-felix-file-install.html
>
> p.s. I keep this part because it describes very well how the cache
> strategy "used" work:
> >>>>> Hi David
> >>>>>
> >>>>> Assuming that you are using the default distribution of Apache
> Stanbol.
> >>>>>
> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
> >>>>> - only the first time answered by retrieving the Entity form
> DBpedia.org
> >>>>> - the Information are cached in a local cache. By that values of the
> >>>>> documents are filtered (see (a) for details)
> >>>>> - the cached version is returned
> >>>>>
> >>>>> (a) The default configuration for dbpedia stores all fields however
> >>>>> filters values for literals so that only values with the language
> "en,
> >>>>> de, fr, it, es" or no language are stored.
> >>>>>
> >>>>>
> >>>>> Assuming that you have started for zero when updating to a new
> version
> >>>>> this also means that you have downloaded a new version of this Entity
> >>>>> from dbPedia.
> >>>>>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
<nu...@cs.unibo.it> wrote:
> I solved in the same way, but loosing the caching capabilities.
> Is there any possibility to keep both all the data and the cache?
>
> Andrea
>
> On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>
>> Ok, stopping the solrYard dbpedia_43k component solved for me.
>>
>> Thanks,
>> David
>>
>> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>> david.riccitelli@interact.it> wrote:
>>
>>> Hi Rupert,
>>>
>>> I recently updated the Stanbol install, and I found that the RDF returned
>>> by the EntityHub is missing some props (specifically the dbprop as far as I
>>> can see).
>>>
>>> This is the command that I use for testing:
>>> curl -H "accept: application/rdf+xml" "
>>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>> "
>>>
>>> which outputs the attached RDF file.
>>>
>>> I cleared all of the sling folder (rm -fr sling) and checked the with the
>>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>>>
>>> Does this depend on the mapping.txt file?
>>>

If you plan to create your own dbpedia index, than the mapping.txt
file would be the way how to configure what properties are
includes/excluded.
Typically dbprop values are low quality. They are just naive 1:1
mappings of key value pairs as found in the info boxes. Because of
this they are excluded from the indexes.

At runtime the returned data depend on the used Cache strategy:

Currently there are three possibilities (configured with the referenced Site)
1) no cache: bot queries and retrieval so use a remote service
2) used: Queries are executed by the remote service. Retrieved
Entities are stored locally. The cached data depend on the mappings
defined for the cache.
3) all: Both queries and retrieval are based on the cache. The remote
service are only used as fallback in the case that the cache is not
available (e.g. if you deactivate solrYard).

So if you you are fine with (2) than you could use the configuration
as previously used by the stable launcher [1].
I think the easiest way to install this is to use this is to add the
Felix File Installer [2] to the Stanbol Environment. You will need to
delete the current referencedSite for dbpedia first and than add the
three configuration files as described by [1].

If your requirements are not covered by the currently available option
it would be nice if you could write a short user story, because I am
thinking about how to improve this feature and input like that would
be really valuable.

best
Rupert Westenthaler

[1] The dbpedia config consists of three files. the referenced site,
cache and solryard components with the "-dbpedia" endings.
http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181

[2] http://felix.apache.org/site/apache-felix-file-install.html

p.s. I keep this part because it describes very well how the cache
strategy "used" work:
>>>>> Hi David
>>>>>
>>>>> Assuming that you are using the default distribution of Apache Stanbol.
>>>>>
>>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>>>>> - only the first time answered by retrieving the Entity form DBpedia.org
>>>>> - the Information are cached in a local cache. By that values of the
>>>>> documents are filtered (see (a) for details)
>>>>> - the cached version is returned
>>>>>
>>>>> (a) The default configuration for dbpedia stores all fields however
>>>>> filters values for literals so that only values with the language "en,
>>>>> de, fr, it, es" or no language are stored.
>>>>>
>>>>>
>>>>> Assuming that you have started for zero when updating to a new version
>>>>> this also means that you have downloaded a new version of this Entity
>>>>> from dbPedia.
>>>>>

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: EntityHub and DBpedia

Posted by Andrea Giovanni Nuzzolese <nu...@cs.unibo.it>.
I solved in the same way, but loosing the caching capabilities.
Is there any possibility to keep both all the data and the cache?

Andrea

On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:

> Ok, stopping the solrYard dbpedia_43k component solved for me.
> 
> Thanks,
> David
> 
> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> david.riccitelli@interact.it> wrote:
> 
>> Hi Rupert,
>> 
>> I recently updated the Stanbol install, and I found that the RDF returned
>> by the EntityHub is missing some props (specifically the dbprop as far as I
>> can see).
>> 
>> This is the command that I use for testing:
>> curl -H "accept: application/rdf+xml" "
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>> "
>> 
>> which outputs the attached RDF file.
>> 
>> I cleared all of the sling folder (rm -fr sling) and checked the with the
>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>> 
>> Does this depend on the mapping.txt file?
>> 
>> Thanks for your help.
>> 
>> BR
>> David
>> 
>> On Mon, Jun 6, 2011 at 3:22 PM, David Riccitelli <
>> david.riccitelli@interact.it> wrote:
>> 
>>> Hi Rupert,
>>> 
>>> Clearing the cache solved the issue. Thanks for you help.
>>> 
>>> BR,
>>> David
>>> 
>>> 
>>> On Mon, Jun 6, 2011 at 3:15 PM, Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com> wrote:
>>> 
>>>> Hi David
>>>> 
>>>> Assuming that you are using the default distribution of Apache Stanbol.
>>>> 
>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>>>> - only the first time answered by retrieving the Entity form DBpedia.org
>>>> - the Information are cached in a local cache. By that values of the
>>>> documents are filtered (see (a) for details)
>>>> - the cached version is returned
>>>> 
>>>> (a) The default configuration for dbpedia stores all fields however
>>>> filters values for literals so that only values with the language "en,
>>>> de, fr, it, es" or no language are stored.
>>>> 
>>>> 
>>>> Assuming that you have started for zero when updating to a new version
>>>> this also means that you have downloaded a new version of this Entity
>>>> from dbPedia.
>>>> 
>>>> I made several tests and I was not able to reproduce the described case.
>>>> 
>>>> I tried the following:
>>>> 
>>>> curl -H "Accept: application/rdf+xml"
>>>> 
>>>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>>> 
>>>> The response contained both the expected thumbnail and the depiction
>>>> value. (see the attached file).
>>>> 
>>>> Also comparing the results with the one directly provided by DBpedia
>>>> indicated no problems. Try to post
>>>> 
>>>> CONSTRUCT { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
>>>> WHERE { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
>>>> 
>>>> onto  http//dbpedia.org/sparql.
>>>> 
>>>> Therefore I would suggest you to try:
>>>> - to copy the launcher jar to an other directory and init an other
>>>> Stanbol installation (use -p {port} to change the port)
>>>> - if you do not care about loosing the local cache you can also just
>>>> delete the "{stanbol_root_folder}/sling/entityhub/solrYard/indexes/cache"
>>>> folder and restart Stanbol.
>>>> 
>>>> It this solves the problem, that the Error will be most likely be
>>>> cause by DBpedoa.org not returning all data. It the properties are
>>>> still missing, than it has to be something todo with the configuration
>>>> provided within the launcher jar.
>>>> In that case I would need more information about the launcher you use
>>>> and possible changes to your configuration.
>>>> 
>>>> best
>>>> Rupert Westenthaler
>>>> 
>>>> 
>>>> On Mon, Jun 6, 2011 at 10:29 AM, David Riccitelli
>>>> <da...@interact.it> wrote:
>>>>> Dears,
>>>>> Before r1131053, when querying for
>>>>> resource http://dbpedia.org/resource/Valentino_Rossi, I was able to
>>>> get:
>>>>> 
>>>>>  <j.8:thumbnail
>>>>> rdf:resource="
>>>> http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Valentino_Rossi_2010_Qatar.jpg/200px-Valentino_Rossi_2010_Qatar.jpg
>>>> "/>
>>>>>  <j.3:depiction
>>>>> rdf:resource="
>>>> http://upload.wikimedia.org/wikipedia/commons/8/81/Valentino_Rossi_2010_Qatar.jpg
>>>> "/>
>>>>> 
>>>>> being:
>>>>> 
>>>>>  xmlns:j.3="http://xmlns.com/foaf/0.1/"
>>>>>  xmlns:j.8="http://dbpedia.org/ontology/"
>>>>> 
>>>>> With r1131053 however these information are not provided anymore: see
>>>> the
>>>>> attached file for a comparison.
>>>>> I understand the configuration of EntityHub changed, how do I get those
>>>>> information back from EntityHub?
>>>>> BR,
>>>>> David
>>>>> 
>>>>> --
>>>>> David Riccitelli
>>>>> 
>>>>> Interact SpA
>>>>> Via A. Bargoni 78 (scala F)
>>>>> 00153 Roma
>>>>> 
>>>>> T +39 06 58318 301
>>>>> F +39 06 58318 303
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> David Riccitelli
>>> 
>>> Interact SpA
>>> Via A. Bargoni 78 (scala F)
>>> 00153 Roma
>>> 
>>> T +39 06 58318 301
>>> F +39 06 58318 303
>>> 
>>> 
>> 
>> 
>> --
>> David Riccitelli
>> 
>> Interact SpA
>> Via A. Bargoni 78 (scala F)
>> 00153 Roma
>> 
>> T +39 06 58318 301
>> F +39 06 58318 303
>> 
>> 
> 
> 
> -- 
> David Riccitelli
> 
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
> 
> T +39 06 58318 301
> F +39 06 58318 303


Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Ok, stopping the solrYard dbpedia_43k component solved for me.

Thanks,
David

On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
david.riccitelli@interact.it> wrote:

> Hi Rupert,
>
> I recently updated the Stanbol install, and I found that the RDF returned
> by the EntityHub is missing some props (specifically the dbprop as far as I
> can see).
>
> This is the command that I use for testing:
>  curl -H "accept: application/rdf+xml" "
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> "
>
> which outputs the attached RDF file.
>
> I cleared all of the sling folder (rm -fr sling) and checked the with the
> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>
> Does this depend on the mapping.txt file?
>
> Thanks for your help.
>
> BR
> David
>
> On Mon, Jun 6, 2011 at 3:22 PM, David Riccitelli <
> david.riccitelli@interact.it> wrote:
>
>> Hi Rupert,
>>
>> Clearing the cache solved the issue. Thanks for you help.
>>
>> BR,
>> David
>>
>>
>> On Mon, Jun 6, 2011 at 3:15 PM, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>>> Hi David
>>>
>>> Assuming that you are using the default distribution of Apache Stanbol.
>>>
>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>>>  - only the first time answered by retrieving the Entity form DBpedia.org
>>>  - the Information are cached in a local cache. By that values of the
>>> documents are filtered (see (a) for details)
>>>  - the cached version is returned
>>>
>>> (a) The default configuration for dbpedia stores all fields however
>>> filters values for literals so that only values with the language "en,
>>> de, fr, it, es" or no language are stored.
>>>
>>>
>>> Assuming that you have started for zero when updating to a new version
>>> this also means that you have downloaded a new version of this Entity
>>> from dbPedia.
>>>
>>> I made several tests and I was not able to reproduce the described case.
>>>
>>> I tried the following:
>>>
>>> curl -H "Accept: application/rdf+xml"
>>>
>>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>>
>>> The response contained both the expected thumbnail and the depiction
>>> value. (see the attached file).
>>>
>>> Also comparing the results with the one directly provided by DBpedia
>>> indicated no problems. Try to post
>>>
>>> CONSTRUCT { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
>>> WHERE { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
>>>
>>> onto  http//dbpedia.org/sparql.
>>>
>>> Therefore I would suggest you to try:
>>>  - to copy the launcher jar to an other directory and init an other
>>> Stanbol installation (use -p {port} to change the port)
>>>  - if you do not care about loosing the local cache you can also just
>>> delete the "{stanbol_root_folder}/sling/entityhub/solrYard/indexes/cache"
>>> folder and restart Stanbol.
>>>
>>> It this solves the problem, that the Error will be most likely be
>>> cause by DBpedoa.org not returning all data. It the properties are
>>> still missing, than it has to be something todo with the configuration
>>> provided within the launcher jar.
>>> In that case I would need more information about the launcher you use
>>> and possible changes to your configuration.
>>>
>>> best
>>> Rupert Westenthaler
>>>
>>>
>>> On Mon, Jun 6, 2011 at 10:29 AM, David Riccitelli
>>> <da...@interact.it> wrote:
>>> > Dears,
>>> > Before r1131053, when querying for
>>> > resource http://dbpedia.org/resource/Valentino_Rossi, I was able to
>>> get:
>>> >
>>> >     <j.8:thumbnail
>>> > rdf:resource="
>>> http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Valentino_Rossi_2010_Qatar.jpg/200px-Valentino_Rossi_2010_Qatar.jpg
>>> "/>
>>> >     <j.3:depiction
>>> > rdf:resource="
>>> http://upload.wikimedia.org/wikipedia/commons/8/81/Valentino_Rossi_2010_Qatar.jpg
>>> "/>
>>> >
>>> > being:
>>> >
>>> >     xmlns:j.3="http://xmlns.com/foaf/0.1/"
>>> >     xmlns:j.8="http://dbpedia.org/ontology/"
>>> >
>>> > With r1131053 however these information are not provided anymore: see
>>> the
>>> > attached file for a comparison.
>>> > I understand the configuration of EntityHub changed, how do I get those
>>> > information back from EntityHub?
>>> > BR,
>>> > David
>>> >
>>> > --
>>> > David Riccitelli
>>> >
>>> > Interact SpA
>>> > Via A. Bargoni 78 (scala F)
>>> > 00153 Roma
>>> >
>>> > T +39 06 58318 301
>>> > F +39 06 58318 303
>>> >
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>>
>> --
>> David Riccitelli
>>
>> Interact SpA
>> Via A. Bargoni 78 (scala F)
>> 00153 Roma
>>
>> T +39 06 58318 301
>> F +39 06 58318 303
>>
>>
>
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>
>


-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Hi Rupert,

I recently updated the Stanbol install, and I found that the RDF returned by
the EntityHub is missing some props (specifically the dbprop as far as I can
see).

This is the command that I use for testing:
 curl -H "accept: application/rdf+xml" "
http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
"

which outputs the attached RDF file.

I cleared all of the sling folder (rm -fr sling) and checked the with the
SPAQL end-point at DBpedia, but I wasn't able to fix it.

Does this depend on the mapping.txt file?

Thanks for your help.

BR
David

On Mon, Jun 6, 2011 at 3:22 PM, David Riccitelli <
david.riccitelli@interact.it> wrote:

> Hi Rupert,
>
> Clearing the cache solved the issue. Thanks for you help.
>
> BR,
> David
>
>
> On Mon, Jun 6, 2011 at 3:15 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi David
>>
>> Assuming that you are using the default distribution of Apache Stanbol.
>>
>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>>  - only the first time answered by retrieving the Entity form DBpedia.org
>>  - the Information are cached in a local cache. By that values of the
>> documents are filtered (see (a) for details)
>>  - the cached version is returned
>>
>> (a) The default configuration for dbpedia stores all fields however
>> filters values for literals so that only values with the language "en,
>> de, fr, it, es" or no language are stored.
>>
>>
>> Assuming that you have started for zero when updating to a new version
>> this also means that you have downloaded a new version of this Entity
>> from dbPedia.
>>
>> I made several tests and I was not able to reproduce the described case.
>>
>> I tried the following:
>>
>> curl -H "Accept: application/rdf+xml"
>>
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>
>> The response contained both the expected thumbnail and the depiction
>> value. (see the attached file).
>>
>> Also comparing the results with the one directly provided by DBpedia
>> indicated no problems. Try to post
>>
>> CONSTRUCT { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
>> WHERE { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
>>
>> onto  http//dbpedia.org/sparql.
>>
>> Therefore I would suggest you to try:
>>  - to copy the launcher jar to an other directory and init an other
>> Stanbol installation (use -p {port} to change the port)
>>  - if you do not care about loosing the local cache you can also just
>> delete the "{stanbol_root_folder}/sling/entityhub/solrYard/indexes/cache"
>> folder and restart Stanbol.
>>
>> It this solves the problem, that the Error will be most likely be
>> cause by DBpedoa.org not returning all data. It the properties are
>> still missing, than it has to be something todo with the configuration
>> provided within the launcher jar.
>> In that case I would need more information about the launcher you use
>> and possible changes to your configuration.
>>
>> best
>> Rupert Westenthaler
>>
>>
>> On Mon, Jun 6, 2011 at 10:29 AM, David Riccitelli
>> <da...@interact.it> wrote:
>> > Dears,
>> > Before r1131053, when querying for
>> > resource http://dbpedia.org/resource/Valentino_Rossi, I was able to
>> get:
>> >
>> >     <j.8:thumbnail
>> > rdf:resource="
>> http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Valentino_Rossi_2010_Qatar.jpg/200px-Valentino_Rossi_2010_Qatar.jpg
>> "/>
>> >     <j.3:depiction
>> > rdf:resource="
>> http://upload.wikimedia.org/wikipedia/commons/8/81/Valentino_Rossi_2010_Qatar.jpg
>> "/>
>> >
>> > being:
>> >
>> >     xmlns:j.3="http://xmlns.com/foaf/0.1/"
>> >     xmlns:j.8="http://dbpedia.org/ontology/"
>> >
>> > With r1131053 however these information are not provided anymore: see
>> the
>> > attached file for a comparison.
>> > I understand the configuration of EntityHub changed, how do I get those
>> > information back from EntityHub?
>> > BR,
>> > David
>> >
>> > --
>> > David Riccitelli
>> >
>> > Interact SpA
>> > Via A. Bargoni 78 (scala F)
>> > 00153 Roma
>> >
>> > T +39 06 58318 301
>> > F +39 06 58318 303
>> >
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstra�e 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>
>


-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by David Riccitelli <da...@interact.it>.
Hi Rupert,

Clearing the cache solved the issue. Thanks for you help.

BR,
David

On Mon, Jun 6, 2011 at 3:15 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi David
>
> Assuming that you are using the default distribution of Apache Stanbol.
>
> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>  - only the first time answered by retrieving the Entity form DBpedia.org
>  - the Information are cached in a local cache. By that values of the
> documents are filtered (see (a) for details)
>  - the cached version is returned
>
> (a) The default configuration for dbpedia stores all fields however
> filters values for literals so that only values with the language "en,
> de, fr, it, es" or no language are stored.
>
>
> Assuming that you have started for zero when updating to a new version
> this also means that you have downloaded a new version of this Entity
> from dbPedia.
>
> I made several tests and I was not able to reproduce the described case.
>
> I tried the following:
>
> curl -H "Accept: application/rdf+xml"
>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>
> The response contained both the expected thumbnail and the depiction
> value. (see the attached file).
>
> Also comparing the results with the one directly provided by DBpedia
> indicated no problems. Try to post
>
> CONSTRUCT { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
> WHERE { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
>
> onto  http//dbpedia.org/sparql.
>
> Therefore I would suggest you to try:
>  - to copy the launcher jar to an other directory and init an other
> Stanbol installation (use -p {port} to change the port)
>  - if you do not care about loosing the local cache you can also just
> delete the "{stanbol_root_folder}/sling/entityhub/solrYard/indexes/cache"
> folder and restart Stanbol.
>
> It this solves the problem, that the Error will be most likely be
> cause by DBpedoa.org not returning all data. It the properties are
> still missing, than it has to be something todo with the configuration
> provided within the launcher jar.
> In that case I would need more information about the launcher you use
> and possible changes to your configuration.
>
> best
> Rupert Westenthaler
>
>
> On Mon, Jun 6, 2011 at 10:29 AM, David Riccitelli
> <da...@interact.it> wrote:
> > Dears,
> > Before r1131053, when querying for
> > resource http://dbpedia.org/resource/Valentino_Rossi, I was able to get:
> >
> >     <j.8:thumbnail
> > rdf:resource="
> http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Valentino_Rossi_2010_Qatar.jpg/200px-Valentino_Rossi_2010_Qatar.jpg
> "/>
> >     <j.3:depiction
> > rdf:resource="
> http://upload.wikimedia.org/wikipedia/commons/8/81/Valentino_Rossi_2010_Qatar.jpg
> "/>
> >
> > being:
> >
> >     xmlns:j.3="http://xmlns.com/foaf/0.1/"
> >     xmlns:j.8="http://dbpedia.org/ontology/"
> >
> > With r1131053 however these information are not provided anymore: see the
> > attached file for a comparison.
> > I understand the configuration of EntityHub changed, how do I get those
> > information back from EntityHub?
> > BR,
> > David
> >
> > --
> > David Riccitelli
> >
> > Interact SpA
> > Via A. Bargoni 78 (scala F)
> > 00153 Roma
> >
> > T +39 06 58318 301
> > F +39 06 58318 303
> >
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Re: EntityHub and DBpedia

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi David

Assuming that you are using the default distribution of Apache Stanbol.

Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
 - only the first time answered by retrieving the Entity form DBpedia.org
 - the Information are cached in a local cache. By that values of the
documents are filtered (see (a) for details)
 - the cached version is returned

(a) The default configuration for dbpedia stores all fields however
filters values for literals so that only values with the language "en,
de, fr, it, es" or no language are stored.


Assuming that you have started for zero when updating to a new version
this also means that you have downloaded a new version of this Entity
from dbPedia.

I made several tests and I was not able to reproduce the described case.

I tried the following:

curl -H "Accept: application/rdf+xml"
http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi

The response contained both the expected thumbnail and the depiction
value. (see the attached file).

Also comparing the results with the one directly provided by DBpedia
indicated no problems. Try to post

CONSTRUCT { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }
WHERE { <http://dbpedia.org/resource/Valentino_Rossi> ?p ?o }

onto  http//dbpedia.org/sparql.

Therefore I would suggest you to try:
 - to copy the launcher jar to an other directory and init an other
Stanbol installation (use -p {port} to change the port)
 - if you do not care about loosing the local cache you can also just
delete the "{stanbol_root_folder}/sling/entityhub/solrYard/indexes/cache"
folder and restart Stanbol.

It this solves the problem, that the Error will be most likely be
cause by DBpedoa.org not returning all data. It the properties are
still missing, than it has to be something todo with the configuration
provided within the launcher jar.
In that case I would need more information about the launcher you use
and possible changes to your configuration.

best
Rupert Westenthaler


On Mon, Jun 6, 2011 at 10:29 AM, David Riccitelli
<da...@interact.it> wrote:
> Dears,
> Before r1131053, when querying for
> resource http://dbpedia.org/resource/Valentino_Rossi, I was able to get:
>
>     <j.8:thumbnail
> rdf:resource="http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Valentino_Rossi_2010_Qatar.jpg/200px-Valentino_Rossi_2010_Qatar.jpg"/>
>     <j.3:depiction
> rdf:resource="http://upload.wikimedia.org/wikipedia/commons/8/81/Valentino_Rossi_2010_Qatar.jpg"/>
>
> being:
>
>     xmlns:j.3="http://xmlns.com/foaf/0.1/"
>     xmlns:j.8="http://dbpedia.org/ontology/"
>
> With r1131053 however these information are not provided anymore: see the
> attached file for a comparison.
> I understand the configuration of EntityHub changed, how do I get those
> information back from EntityHub?
> BR,
> David
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen