You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by florent andré <fl...@4sengines.com> on 2011/06/13 13:06:59 UTC

Entityhub : Can't retrieve entity with a #

Hi Rupert,
Hope you are fine.

I have another problem...
In my skos, entity are identify by an #, like this :

   <rdf:Description 
rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
     <skos:broader 
rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
     <skos:prefLabel>GRADIENT</skos:prefLabel>
     <skos:inScheme 
rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
     <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
   </rdf:Description>

And I can't arrive to find the entity with the entity endpoint.

* With the # char :
curl 
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
==> answer is
Entity with ID 'http://www.test.fr/terminology' not found an any 
referenced site

==> the part after the # is remove

* With replacement of the # by %23 (the urlencode equivalent) :
curl 
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
==> answer is
Entity with ID 
'http://www.test.fr/terminology#space_mathematiques_1306341820765' not 
found an any referenced site

==> all the id is keep, but still not found...
The result is the same if I urlencode all the entity id.

This is related to a bug or something I do wrong ?

Thanks.
++


Re: Entityhub : Can't retrieve entity with a #

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Mon, Jun 13, 2011 at 7:40 PM, Florent André <fl...@apache.org> wrote:
>
>
> On 06/13/2011 06:23 PM, Rupert Westenthaler wrote:
> ...
>>
>>> 2) In the Felix console, when try to modify the "Apache Stanbol Entityhub
>>> Referenced Site Configuration" of an imported index.
>>> There is an ajax error on save :
>>> The request failed:
>>> [object XMLDocument]
>>
>> Is there also a Exception in the log?
>
> Nothing in the log nor in the standard output...
>
Was there a dialog with the header "AJAX Error" and the message "The
request failed: FULL head"?

best
Rupert Westenthaler


-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Entityhub : Can't retrieve entity with a #

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

> But in fact with this configuration, in Felix configuration "Apache Stanbol
> Entityhub Referenced Site Configuration", entity prefixes are set by default
> to :
> - http://dbpedia.org/resource/
> - http://dbpedia.org/ontology/
>
> So IMO, there may be a bug in the code, or the comment may be change.

Actually while debugging the the '#' thing I discovered that Felix
uses the value defined in the "value" attribute of the "@Property"
annotation as default, even if someone directly uses the
ConfigAdminSerivice to create a component instance. Previously I was
thinking that this values are only used by the Apache Felix Web
Console however such annotations are also used if a property is not
defined in a configuration directly parsed to the configuration admin.

Because of this I will delete all the default values currently used in
the source code and add the current values as Example to the
description of the fields.


> 1) There is a typo error in mapping.txt :
> ==> change
> # copy dc:titel to rdfs:label
> dc:titel > rdfs:label
> ==> to
> # copy dc:title to rdfs:label                           dc:title >
> rdfs:label

Thx. Also added a "dc-elements:title > rdfs:label" mapping

> 2) In the Felix console, when try to modify the "Apache Stanbol Entityhub
> Referenced Site Configuration" of an imported index.
> There is an ajax error on save :
> The request failed:
> [object XMLDocument]

Is there also a Exception in the log?

best
Rupert Westenthaler

On Mon, Jun 13, 2011 at 5:42 PM, Florent André <fl...@apache.org> wrote:
> Hi Rupert,
>
> Thanks for testing it on your side.
>
> I invest and compare iptc configuration VS mine and found the problem !
>
> This come from this line in indexing.properties :
> # the entity prefixes are used to determine if an entity needs to be
> searched
> # on a referenced site. If not specified requests for any entity will be
> # forwarded to this referenced site.
> # use ';' to seperate multiple values
> #org.apache.stanbol.entityhub.site.entityPrefix=http://example.org/resource;urn:mycompany:
>
> Reading this comment, I first leave it commented (not specify an
> entityPrefix), because reading the comment I understand that in any case,
> all requests go to it... and that's fine ! :)
>
> But in fact with this configuration, in Felix configuration "Apache Stanbol
> Entityhub Referenced Site Configuration", entity prefixes are set by default
> to :
> - http://dbpedia.org/resource/
> - http://dbpedia.org/ontology/
>
> So IMO, there may be a bug in the code, or the comment may be change.
>
> During this investigation I also "discover" theses (not closely related to
> this problem) :
>
> 1) There is a typo error in mapping.txt :
> ==> change
> # copy dc:titel to rdfs:label
> dc:titel > rdfs:label
> ==> to
> # copy dc:title to rdfs:label                           dc:title >
> rdfs:label
>
> 2) In the Felix console, when try to modify the "Apache Stanbol Entityhub
> Referenced Site Configuration" of an imported index.
> There is an ajax error on save :
> The request failed:
> [object XMLDocument]
>
> (I see this when try to modify entity prefixes of my imported index).
>
> Please ask if you prefer to have Jira tickets for this issues (if they are
> really ones).
>
> Thanks for you help.
> ++
>
>
> On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
>>
>> Hi florent
>>
>> Using a '#' in the URI has the disadvantages, that browsers will not
>> send the part behind the hash to the server because they assume, that
>> they need to download the whole document and navigate to the anchor
>> within the document.
>>
>> Using curl (or javascript) I think the full URL should be sent to the
>> server (was not able to find some good information about this, but at
>> least "curl -v" says that it sends the whole URL to the server).
>> However on the server side Jersey does also not provide the #{anchor}
>> part of the URL.
>> Sending
>>>
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>
>> will parse only "http://www.test.fr/terminology" to a method annotated
>> with
>>
>>     @GET
>>     @Path("/entity")
>>     public Response getEntity(@QueryParam(value = "id") String id) {
>>         // get the Entity
>>         ...
>>
>> URL encoding the '#' to '%23' causes Jersey to parse
>> "http://www.test.fr/terminology#entity_gradient_1306341921902".
>>
>> In this case the query for an entity with this ID is correctly parsed
>> to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
>> indexed Entity uses '#' it should work as long as Entities are cached
>> locally. If a remote service is used, than the same problem of the '#'
>> reappears for the remote service.
>>
>> To test on my side I have done the following:
>> * renamed the Entities of the IPTC worldregions from
>> "http://cv.iptc.org/newscodes/worldregion/r001" to
>> "http://cv.iptc.org/newscodes/worldregion#r001"
>> * indexed the IPTC using the indexing tools
>> * installed the index to the entityhub
>> * curl -v
>> "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
>>
>> Assuming that
>>>
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>> happend on a referenced site with a full cache (e.g. as created by the
>> Indexing Utility. I was not able to reproduce the Error. If the
>> referenced site uses a remote service to dereferenced entity ids (e.g.
>> the Cool URI) this might happen. In this case I suggest to directly
>> test the remote service.
>>
>> best
>> Rupert Westenthaler
>>
>>
>> On Mon, Jun 13, 2011 at 1:06 PM, florent andré
>> <fl...@4sengines.com>  wrote:
>>>
>>> Hi Rupert,
>>> Hope you are fine.
>>>
>>> I have another problem...
>>> In my skos, entity are identify by an #, like this :
>>>
>>>  <rdf:Description
>>> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>>>    <skos:broader
>>>
>>> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>>>    <skos:prefLabel>GRADIENT</skos:prefLabel>
>>>    <skos:inScheme
>>>
>>> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>>>    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>>>  </rdf:Description>
>>>
>>> And I can't arrive to find the entity with the entity endpoint.
>>>
>>> * With the # char :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>> ==>  answer is
>>> Entity with ID 'http://www.test.fr/terminology' not found an any
>>> referenced
>>> site
>>>
>>> ==>  the part after the # is remove
>>>
>>> * With replacement of the # by %23 (the urlencode equivalent) :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>>> ==>  all the id is keep, but still not found...
>>> The result is the same if I urlencode all the entity id.
>>>
>>> This is related to a bug or something I do wrong ?
>>>
>>> Thanks.
>>> ++
>>>
>>>
>>
>>
>>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Entityhub : Can't retrieve entity with a #

Posted by Florent André <fl...@apache.org>.
Hi Rupert,

Thanks for testing it on your side.

I invest and compare iptc configuration VS mine and found the problem !

This come from this line in indexing.properties :
# the entity prefixes are used to determine if an entity needs to be 
searched
# on a referenced site. If not specified requests for any entity will be
# forwarded to this referenced site.
# use ';' to seperate multiple values
#org.apache.stanbol.entityhub.site.entityPrefix=http://example.org/resource;urn:mycompany:

Reading this comment, I first leave it commented (not specify an 
entityPrefix), because reading the comment I understand that in any 
case, all requests go to it... and that's fine ! :)

But in fact with this configuration, in Felix configuration "Apache 
Stanbol Entityhub Referenced Site Configuration", entity prefixes are 
set by default to :
- http://dbpedia.org/resource/
- http://dbpedia.org/ontology/

So IMO, there may be a bug in the code, or the comment may be change.

During this investigation I also "discover" theses (not closely related 
to this problem) :

1) There is a typo error in mapping.txt :
==> change
# copy dc:titel to rdfs:label
dc:titel > rdfs:label
==> to
# copy dc:title to rdfs:label 
                           dc:title > rdfs:label

2) In the Felix console, when try to modify the "Apache Stanbol 
Entityhub Referenced Site Configuration" of an imported index.
There is an ajax error on save :
The request failed:
[object XMLDocument]

(I see this when try to modify entity prefixes of my imported index).

Please ask if you prefer to have Jira tickets for this issues (if they 
are really ones).

Thanks for you help.
++


On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
> Hi florent
>
> Using a '#' in the URI has the disadvantages, that browsers will not
> send the part behind the hash to the server because they assume, that
> they need to download the whole document and navigate to the anchor
> within the document.
>
> Using curl (or javascript) I think the full URL should be sent to the
> server (was not able to find some good information about this, but at
> least "curl -v" says that it sends the whole URL to the server).
> However on the server side Jersey does also not provide the #{anchor}
> part of the URL.
> Sending
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
> will parse only "http://www.test.fr/terminology" to a method annotated with
>
>      @GET
>      @Path("/entity")
>      public Response getEntity(@QueryParam(value = "id") String id) {
>          // get the Entity
>          ...
>
> URL encoding the '#' to '%23' causes Jersey to parse
> "http://www.test.fr/terminology#entity_gradient_1306341921902".
>
> In this case the query for an entity with this ID is correctly parsed
> to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
> indexed Entity uses '#' it should work as long as Entities are cached
> locally. If a remote service is used, than the same problem of the '#'
> reappears for the remote service.
>
> To test on my side I have done the following:
> * renamed the Entities of the IPTC worldregions from
> "http://cv.iptc.org/newscodes/worldregion/r001" to
> "http://cv.iptc.org/newscodes/worldregion#r001"
> * indexed the IPTC using the indexing tools
> * installed the index to the entityhub
> * curl -v "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
>
> Assuming that
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>> ==>  answer is
>> Entity with ID
>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
>> an any referenced site
>>
> happend on a referenced site with a full cache (e.g. as created by the
> Indexing Utility. I was not able to reproduce the Error. If the
> referenced site uses a remote service to dereferenced entity ids (e.g.
> the Cool URI) this might happen. In this case I suggest to directly
> test the remote service.
>
> best
> Rupert Westenthaler
>
>
> On Mon, Jun 13, 2011 at 1:06 PM, florent andré
> <fl...@4sengines.com>  wrote:
>> Hi Rupert,
>> Hope you are fine.
>>
>> I have another problem...
>> In my skos, entity are identify by an #, like this :
>>
>>   <rdf:Description
>> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>>     <skos:broader
>> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>>     <skos:prefLabel>GRADIENT</skos:prefLabel>
>>     <skos:inScheme
>> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>>     <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>>   </rdf:Description>
>>
>> And I can't arrive to find the entity with the entity endpoint.
>>
>> * With the # char :
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>> ==>  answer is
>> Entity with ID 'http://www.test.fr/terminology' not found an any referenced
>> site
>>
>> ==>  the part after the # is remove
>>
>> * With replacement of the # by %23 (the urlencode equivalent) :
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>> ==>  answer is
>> Entity with ID
>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
>> an any referenced site
>>
>> ==>  all the id is keep, but still not found...
>> The result is the same if I urlencode all the entity id.
>>
>> This is related to a bug or something I do wrong ?
>>
>> Thanks.
>> ++
>>
>>
>
>
>

Re: Entityhub : Can't retrieve entity with a #

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Mon, Jun 13, 2011 at 5:53 PM, Florent André <fl...@apache.org> wrote:
> Yep,
>
> And for continue about the "# case", I observe this "strange" thing :
>
> when request with # or %23 : I always have good metadatas values, but
> - with # : representation field is not good
> - With %23 representation field is ok
>

For referencedSites metadata are generated automatically based on
metadata defined for the site (e.g. copyright, attribution, cache
status ...).
The type "foaf:Document" is used as rdf:type for Metadata. The
"dc:subject" relation is currently used to link metadata with the
entity. However this is already changed in my local version to
"entityhub:about" because it caused problems with entities that also
defined this property.

> (Note : I use a full cached referenced site create with indexing utility.)
>
> Request and answer details :
>
> A) When requested with #
> $ curl
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
> {
>    "id": "http:\/\/www.test.fr\/terminology",
>    "site": "gasoil",
>    "representation": {"id": "http:\/\/www.test.fr\/terminology"},
>    "metadata": {
>        "id": "http:\/\/www.test.fr\/terminology.meta",
>        "http:\/\/www.iks-project.eu\/ontology\/rick\/model\/isChached": [{
>            "type": "value",
>            "value": "true"
>        }],
>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>            "type": "reference",
>            "value": "http:\/\/xmlns.com\/foaf\/0.1\/Document"
>        }],
>        "http:\/\/purl.org\/dc\/terms\/subject": [{
>            "type": "reference",
>            "value": "http:\/\/www.test.fr\/terminology"
>        }]
>    }
> }
>
As noted in the first response everything after the '#' gets ignored.
Therefore this request returns the entity with the id
"http:\/\/www.test.fr\/terminology". It looks like that this entity
actually exists, but does not define any data. Most likely because
this URI is referenced in your SKOS file and is therefore returned by
the Triple Store as "entity" while indexing.

>
> ============
> B) when requested with %23
>
> $ curl
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.f/terminology%23entity_gradient_1306341921902"
> {
>    "id": "http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902",
>    "site": "gasoil",
>    "representation": {
>        "id":
> "http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902",
>        "http:\/\/www.w3.org\/2004\/02\/skos\/core#broader": [{
>            "type": "reference",
>            "value":
> "http:\/\/www.test.fr\/terminology#entity_operateur_mathematique_1306341918995"
>        }],
>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>            "type": "reference",
>            "value": "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
>        }],
>        "http:\/\/www.w3.org\/2004\/02\/skos\/core#inScheme": [{
>            "type": "reference",
>            "value":
> "http:\/\/www.test.fr\/terminology#space_mathematiques_1306341820765"
>        }],
>        "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label": [{
>            "type": "text",
>            "value": "GRADIENT"
>        }],
>        "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel": [{
>            "type": "text",
>            "value": "GRADIENT"
>        }]
>    },
>    "metadata": {
>        "id":
> "http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902.meta",
>        "http:\/\/www.iks-project.eu\/ontology\/rick\/model\/isChached": [{
>            "type": "value",
>            "value": "true"
>        }],
>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>            "type": "reference",
>            "value": "http:\/\/xmlns.com\/foaf\/0.1\/Document"
>        }],
>        "http:\/\/purl.org\/dc\/terms\/subject": [{
>            "type": "reference",
>            "value":
> "http:\/\/www.edf.fr\/terminology#entity_gradient_1306341921902"
>        }]
>    }
> }

This is the actual entity as requested.


BTW:

Rather than using the ReferencedSiteManager

    http://localhost:8080/entityhub/sites/entity?id={id}

it would be better to directly use the ReferencedSite

    http://localhost:8080/entityhub/site/{siteId}/entity?id={id}

because if you would have other ReferencedSites that do not define
Entity prefixes that the Requests would be actually sent to more than
one site before answered.
If one knows what site do hold the searched entity, than it is always
better to use directly this site.

best
Rupert Westenthaler


>
>
> ++
>
>
> On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
>>
>> Hi florent
>>
>> Using a '#' in the URI has the disadvantages, that browsers will not
>> send the part behind the hash to the server because they assume, that
>> they need to download the whole document and navigate to the anchor
>> within the document.
>>
>> Using curl (or javascript) I think the full URL should be sent to the
>> server (was not able to find some good information about this, but at
>> least "curl -v" says that it sends the whole URL to the server).
>> However on the server side Jersey does also not provide the #{anchor}
>> part of the URL.
>> Sending
>>>
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>
>> will parse only "http://www.test.fr/terminology" to a method annotated
>> with
>>
>>     @GET
>>     @Path("/entity")
>>     public Response getEntity(@QueryParam(value = "id") String id) {
>>         // get the Entity
>>         ...
>>
>> URL encoding the '#' to '%23' causes Jersey to parse
>> "http://www.test.fr/terminology#entity_gradient_1306341921902".
>>
>> In this case the query for an entity with this ID is correctly parsed
>> to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
>> indexed Entity uses '#' it should work as long as Entities are cached
>> locally. If a remote service is used, than the same problem of the '#'
>> reappears for the remote service.
>>
>> To test on my side I have done the following:
>> * renamed the Entities of the IPTC worldregions from
>> "http://cv.iptc.org/newscodes/worldregion/r001" to
>> "http://cv.iptc.org/newscodes/worldregion#r001"
>> * indexed the IPTC using the indexing tools
>> * installed the index to the entityhub
>> * curl -v
>> "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
>>
>> Assuming that
>>>
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>> happend on a referenced site with a full cache (e.g. as created by the
>> Indexing Utility. I was not able to reproduce the Error. If the
>> referenced site uses a remote service to dereferenced entity ids (e.g.
>> the Cool URI) this might happen. In this case I suggest to directly
>> test the remote service.
>>
>> best
>> Rupert Westenthaler
>>
>>
>> On Mon, Jun 13, 2011 at 1:06 PM, florent andré
>> <fl...@4sengines.com>  wrote:
>>>
>>> Hi Rupert,
>>> Hope you are fine.
>>>
>>> I have another problem...
>>> In my skos, entity are identify by an #, like this :
>>>
>>>  <rdf:Description
>>> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>>>    <skos:broader
>>>
>>> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>>>    <skos:prefLabel>GRADIENT</skos:prefLabel>
>>>    <skos:inScheme
>>>
>>> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>>>    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>>>  </rdf:Description>
>>>
>>> And I can't arrive to find the entity with the entity endpoint.
>>>
>>> * With the # char :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>> ==>  answer is
>>> Entity with ID 'http://www.test.fr/terminology' not found an any
>>> referenced
>>> site
>>>
>>> ==>  the part after the # is remove
>>>
>>> * With replacement of the # by %23 (the urlencode equivalent) :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>>> ==>  all the id is keep, but still not found...
>>> The result is the same if I urlencode all the entity id.
>>>
>>> This is related to a bug or something I do wrong ?
>>>
>>> Thanks.
>>> ++
>>>
>>>
>>
>>
>>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Entityhub : Can't retrieve entity with a #

Posted by Florent André <fl...@apache.org>.
Yep,

And for continue about the "# case", I observe this "strange" thing :

when request with # or %23 : I always have good metadatas values, but
- with # : representation field is not good
- With %23 representation field is ok

(Note : I use a full cached referenced site create with indexing utility.)

Request and answer details :

A) When requested with #
$ curl 
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
{
     "id": "http:\/\/www.test.fr\/terminology",
     "site": "gasoil",
     "representation": {"id": "http:\/\/www.test.fr\/terminology"},
     "metadata": {
         "id": "http:\/\/www.test.fr\/terminology.meta",
         "http:\/\/www.iks-project.eu\/ontology\/rick\/model\/isChached": [{
             "type": "value",
             "value": "true"
         }],
         "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
             "type": "reference",
             "value": "http:\/\/xmlns.com\/foaf\/0.1\/Document"
         }],
         "http:\/\/purl.org\/dc\/terms\/subject": [{
             "type": "reference",
             "value": "http:\/\/www.test.fr\/terminology"
         }]
     }
}


============
B) when requested with %23

$ curl 
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.f/terminology%23entity_gradient_1306341921902"
{
     "id": 
"http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902",
     "site": "gasoil",
     "representation": {
         "id": 
"http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902",
         "http:\/\/www.w3.org\/2004\/02\/skos\/core#broader": [{
             "type": "reference",
             "value": 
"http:\/\/www.test.fr\/terminology#entity_operateur_mathematique_1306341918995"
         }],
         "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
             "type": "reference",
             "value": "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
         }],
         "http:\/\/www.w3.org\/2004\/02\/skos\/core#inScheme": [{
             "type": "reference",
             "value": 
"http:\/\/www.test.fr\/terminology#space_mathematiques_1306341820765"
         }],
         "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label": [{
             "type": "text",
             "value": "GRADIENT"
         }],
         "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel": [{
             "type": "text",
             "value": "GRADIENT"
         }]
     },
     "metadata": {
         "id": 
"http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902.meta",
         "http:\/\/www.iks-project.eu\/ontology\/rick\/model\/isChached": [{
             "type": "value",
             "value": "true"
         }],
         "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
             "type": "reference",
             "value": "http:\/\/xmlns.com\/foaf\/0.1\/Document"
         }],
         "http:\/\/purl.org\/dc\/terms\/subject": [{
             "type": "reference",
             "value": 
"http:\/\/www.edf.fr\/terminology#entity_gradient_1306341921902"
         }]
     }
}


++


On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
> Hi florent
>
> Using a '#' in the URI has the disadvantages, that browsers will not
> send the part behind the hash to the server because they assume, that
> they need to download the whole document and navigate to the anchor
> within the document.
>
> Using curl (or javascript) I think the full URL should be sent to the
> server (was not able to find some good information about this, but at
> least "curl -v" says that it sends the whole URL to the server).
> However on the server side Jersey does also not provide the #{anchor}
> part of the URL.
> Sending
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
> will parse only "http://www.test.fr/terminology" to a method annotated with
>
>      @GET
>      @Path("/entity")
>      public Response getEntity(@QueryParam(value = "id") String id) {
>          // get the Entity
>          ...
>
> URL encoding the '#' to '%23' causes Jersey to parse
> "http://www.test.fr/terminology#entity_gradient_1306341921902".
>
> In this case the query for an entity with this ID is correctly parsed
> to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
> indexed Entity uses '#' it should work as long as Entities are cached
> locally. If a remote service is used, than the same problem of the '#'
> reappears for the remote service.
>
> To test on my side I have done the following:
> * renamed the Entities of the IPTC worldregions from
> "http://cv.iptc.org/newscodes/worldregion/r001" to
> "http://cv.iptc.org/newscodes/worldregion#r001"
> * indexed the IPTC using the indexing tools
> * installed the index to the entityhub
> * curl -v "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
>
> Assuming that
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>> ==>  answer is
>> Entity with ID
>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
>> an any referenced site
>>
> happend on a referenced site with a full cache (e.g. as created by the
> Indexing Utility. I was not able to reproduce the Error. If the
> referenced site uses a remote service to dereferenced entity ids (e.g.
> the Cool URI) this might happen. In this case I suggest to directly
> test the remote service.
>
> best
> Rupert Westenthaler
>
>
> On Mon, Jun 13, 2011 at 1:06 PM, florent andré
> <fl...@4sengines.com>  wrote:
>> Hi Rupert,
>> Hope you are fine.
>>
>> I have another problem...
>> In my skos, entity are identify by an #, like this :
>>
>>   <rdf:Description
>> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>>     <skos:broader
>> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>>     <skos:prefLabel>GRADIENT</skos:prefLabel>
>>     <skos:inScheme
>> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>>     <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>>   </rdf:Description>
>>
>> And I can't arrive to find the entity with the entity endpoint.
>>
>> * With the # char :
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>> ==>  answer is
>> Entity with ID 'http://www.test.fr/terminology' not found an any referenced
>> site
>>
>> ==>  the part after the # is remove
>>
>> * With replacement of the # by %23 (the urlencode equivalent) :
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>> ==>  answer is
>> Entity with ID
>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
>> an any referenced site
>>
>> ==>  all the id is keep, but still not found...
>> The result is the same if I urlencode all the entity id.
>>
>> This is related to a bug or something I do wrong ?
>>
>> Thanks.
>> ++
>>
>>
>
>
>

Re: Entityhub : Can't retrieve entity with a #

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi florent

Using a '#' in the URI has the disadvantages, that browsers will not
send the part behind the hash to the server because they assume, that
they need to download the whole document and navigate to the anchor
within the document.

Using curl (or javascript) I think the full URL should be sent to the
server (was not able to find some good information about this, but at
least "curl -v" says that it sends the whole URL to the server).
However on the server side Jersey does also not provide the #{anchor}
part of the URL.
Sending
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
will parse only "http://www.test.fr/terminology" to a method annotated with

    @GET
    @Path("/entity")
    public Response getEntity(@QueryParam(value = "id") String id) {
        // get the Entity
        ...

URL encoding the '#' to '%23' causes Jersey to parse
"http://www.test.fr/terminology#entity_gradient_1306341921902".

In this case the query for an entity with this ID is correctly parsed
to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
indexed Entity uses '#' it should work as long as Entities are cached
locally. If a remote service is used, than the same problem of the '#'
reappears for the remote service.

To test on my side I have done the following:
* renamed the Entities of the IPTC worldregions from
"http://cv.iptc.org/newscodes/worldregion/r001" to
"http://cv.iptc.org/newscodes/worldregion#r001"
* indexed the IPTC using the indexing tools
* installed the index to the entityhub
* curl -v "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"

Assuming that
> curl
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
> ==> answer is
> Entity with ID
> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
> an any referenced site
>
happend on a referenced site with a full cache (e.g. as created by the
Indexing Utility. I was not able to reproduce the Error. If the
referenced site uses a remote service to dereferenced entity ids (e.g.
the Cool URI) this might happen. In this case I suggest to directly
test the remote service.

best
Rupert Westenthaler


On Mon, Jun 13, 2011 at 1:06 PM, florent andré
<fl...@4sengines.com> wrote:
> Hi Rupert,
> Hope you are fine.
>
> I have another problem...
> In my skos, entity are identify by an #, like this :
>
>  <rdf:Description
> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>    <skos:broader
> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>    <skos:prefLabel>GRADIENT</skos:prefLabel>
>    <skos:inScheme
> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>  </rdf:Description>
>
> And I can't arrive to find the entity with the entity endpoint.
>
> * With the # char :
> curl
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
> ==> answer is
> Entity with ID 'http://www.test.fr/terminology' not found an any referenced
> site
>
> ==> the part after the # is remove
>
> * With replacement of the # by %23 (the urlencode equivalent) :
> curl
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
> ==> answer is
> Entity with ID
> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
> an any referenced site
>
> ==> all the id is keep, but still not found...
> The result is the same if I urlencode all the entity id.
>
> This is related to a bug or something I do wrong ?
>
> Thanks.
> ++
>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen