You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by deepa nagaraj <de...@gmail.com> on 2013/07/19 10:15:13 UTC

ontology

Hi everyone,

my requirement is below:

We have automobile rdf file,  need to generate the triple by using this.

please find the attachment of rdf file.

when i query it with owl:sameAs property i'm unable to retrieve the whole
hierarchy.

for ex: Sedans is same as both car as well as vehicle (should get
GMCrossovers, GMSportConvertible etc by using inference ).

but i'm getting only car .


my doubt is why am i nt getting the result.

but when i run the same file with allegrograph, i'm getting the result.


-- 
Regards,

Deepa

Re: Converting result rows to columns in Sparql

Posted by Andy Seaborne <an...@apache.org>.
On 27/07/13 13:00, hueyl16@aol.com wrote:
> Hi Andy,
>
> thanks for the UNION suggestion. I am going to give this a shot.
> Ultimately I am looking at 40-50 of these optional triple patterns in
> one query for about 50.000 patients. I am a little worried about how
> the query performance is going to be.

The union approach will scale linearly. read as "and then do", which is 
pretty much as it is implemented.

As the number of blocks grows, the effect of UNION vs OPTIONAL will 
increase as well.

> I noticed the increased redundancy. It seemed a little unexpected to
> me coming from the relational DB world where a query like that would
> be quite fast and create a fairly efficient result set. But I guess
> the DB has more hierarchical knowledge and would be designed for
> exactly this purpose whereas a triple store has to accommodate a lot
> more angles.

That's certainly true - a patient could be a single row of a highly 
denormalised table (I've seen this done to support call centers for 
speed and per-customer isolation) - the table had all the details in a row

(patient, dypsneaType, dypsneaType, ...)

and lots of nulls.  The denormalisation is assuming one exactly one of 
each type.  It's like precalculating the results for a particular access 
pattern.

> Would more specific sub-properties be of any help? Instead of
> Has_Finding use Has_Dyspnea, Has_Dysphagia etc. Something like that
> would only affect the query itself (easier to find the triples and
> the rdfs:subClassOf* triple pattern is not required anymore), not the
> number of rows in the result set, correct? Since it still has to do
> the cross-product?

The rdfs:subClassOf* may be expensive, depending on the vocabulary. 
It's not likely to alter the scaling, it does give a constant and maybe 
noticeable per result cost.

>
> Would it be better to just get all Has_Finding triples and format the
> result in custom post-processing?
>
> Or should I split it into multiple queries and combine the results in
> post-processing?
>
> I was playing around with sub-queries a little bit, but I noticed
> that a variable bound in the main query is not passed to the
> sub-query. So Idid not make much progress there.

Evaluation is "inside outwards" : logically each {} block is evaluated 
and then combined.  We all read queries top to bottom as written and ARQ 
happens to prefer to execute like that - it checks it's legal first though.

> -Wolfgang

>>> Wolfgang
>>>
>>>
>>
>
>
>
>


Re: Converting result rows to columns in Sparql

Posted by hu...@aol.com.
Hi Andy,

thanks for the UNION suggestion. I am going to give this a shot. Ultimately I am looking at 40-50 of these optional triple patterns in one query for about 50.000 patients. I am a little worried about how the query performance is going to be.

I noticed the increased redundancy. It seemed a little unexpected to me coming from the relational DB world where a query like that would be quite fast and create a fairly efficient result set. But I guess the DB has more hierarchical knowledge and would be designed for exactly this purpose whereas a triple store has to accommodate a lot more angles.

Would more specific sub-properties be of any help? Instead of Has_Finding use Has_Dyspnea, Has_Dysphagia etc. Something like that would only affect the query itself (easier to find the triples and the rdfs:subClassOf* triple pattern is not required anymore), not the number of rows in the result set, correct? Since it still has to do the cross-product? 

Would it be better to just get all Has_Finding triples and format the result in custom post-processing?

Or should I split it into multiple queries and combine the results in post-processing?

I was playing around with sub-queries a little bit, but I noticed that a variable bound in the main query is not passed to the sub-query. So I did not make much progress there.

 


 

-Wolfgang


-----Original Message-----
From: Andy Seaborne <an...@apache.org>
To: users <us...@jena.apache.org>
Sent: Thu, Jul 25, 2013 10:54 pm
Subject: Re: Converting result rows to columns in Sparql


Wolfgang,

I think the problem is that you are creating a partial cross product. 
This leads to two problems : a lot of redundancy in the answer

If you ask and get 2 items and

 >> SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType
 >> WHERE {
 >>    ?pat a ec:Patient .
 >>   OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType .
 >> ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . }
 >>    OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a 
?dysphagiaType
 >> . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . }
 >>    OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType .
 >> ?hypertType rdfs:subClassOf* nci:Hypertension . }
 >> }

(aside - please provide complete queries, inc prefixes.  I often have to 
reformat them to make them readable after email has been involved and I 
use either sparql.org or qparse to do that.  Both read in a query and 
print it out again but need a complete query to parse)

I'll try to simnpify:

SELECT *
{
    ?pat a ec:Patient
    OPTIONAL { ?pat ec:Has_Finding ?X1 .
               ?X1 a ?T1 .
               ?T1 rdfs:subClassOf* :Z1}
    OPTIONAL { ?pat ec:Has_Finding ?X2 .
               ?X2 a ?T2 .
               ?T2 rdfs:subClassOf* :Z2}

...
}

For every X1/T1/Z1, ARQ will try X2/T2/Z2 so the numbers grow worse than 
linearly.

if number of(X1/T1/Z1) is 2 and number of(X2/T2/Z2) is 3, there are 6 
rows and 3 rows have one of the two set of (X1/T1/Z1) values and the 
other 3 have the other set.

If it were 5 and 10, there are 50 rows.

What you can do is use UNION.

SELECT *
{
    ?pat a ec:Patient
    {
    { ?pat ec:Has_Finding ?X1 .
               ?X1 a ?T1 .
               ?T1 rdfs:subClassOf* :Z1}
    UNION { ?pat ec:Has_Finding ?X2 .
               ?X2 a ?T2 .
               ?T2 rdfs:subClassOf* :Z2}
    UNION { ... }

...
}

This will put 2 rows for (X1/T1/Z1), with cols for (X2/T2/Z2) being 
unset, and 3 rows for (X2/T2/Z2) with (X1/T1/Z1) being unset. 5 rows.

At small numbers, not so much difference but if it were 5 and 10, there 
are 15 rows as opposed to 50.

	Andy

On 25/07/13 14:14, Olivier Rossel wrote:
> (My answer does not help very much, but anyway :)
>
> I am not sure the spreadsheet view is ok for very nested data structures
> (that makes too many columns in the end).
>
> I would use a CONSTRCUT to retrieve a graph of
> patient-findings-findingType, then do some post-processing to build a
> facetable data structure.
> cf this example:
> http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/test.html
> coming from that JSON:
> http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/data.json#
>
>
>
> On Thu, Jul 25, 2013 at 11:07 AM, <hu...@aol.com> wrote:
>
>> Hi,
>>
>> I am running into some performance issues and was wondering if I was
>> approaching the problem from the correct angle or if there is something
>> more efficient.
>>
>> I have a property "Has_Finding" which is used to assert things about
>> patients. There are different kinds of findings, e.g. Dyspnea, Dysphagia,
>> Death, Hypertension ... So my data looks this:
>>
>> patient1 ec:Has_Finding Dyspnea1 . Dyspnea1 a nci:Dyspnea_Score_2 .
>> patient1 ec:Has_Finding Dysphagia1 . Dysphagia1 a nci:Dysphagia_Score_1.
>> patient1 ec:Has_Finding Dyspnea2 . Dyspnea2 a nci:Dyspnea_Score_3.
>> patient2 ec:Has_Finding Dyspnea3. Dyspnea3 a nci:Dyspnea_Score_2.
>> patient2 ec:Has_Finding Hypertension1 . Hypertension1 a nci:Hypertension .
>> etc.
>>
>> My users want to know about findings. I am offering a GUI-based query tool
>> and am generating the Sparql queries automatically. The users are unaware
>> of Sparql or anything like that.
>>
>> So I can easily get all the findings with the simple sparql query:
>>
>> SELECT ?pat ?findingType
>> WHERE { ?pat ec:Has_Finding ?finding . ?finding a ?findingType. }
>>
>> The problem is that the user has to figure out what finding a particular
>> result row is talking about by looking at the value of ?findingType and by
>> string comparison (or something unreliable like that) find out that this is
>> actually representing a Dyspnea Score. Or a Dysphagia Score etc. before
>> he/she can work with the value itself. But the next row could already be
>> representing something else. This way of analyzing the result requires that
>> the user has some semantic knowledge about the data.
>>
>> So I would like to make the query return the different types of findings
>> as "columns" instead of "rows".
>>
>> SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType
>> WHERE {
>>    ?pat a ec:Patient .
>>    OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType .
>> ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . }
>>    OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a ?dysphagiaType
>> . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . }
>>    OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType .
>> ?hypertType rdfs:subClassOf* nci:Hypertension . }
>> }
>>
>> Of course there are more than just 3 different finding types. When
>> executing these queries, I noticed that the more of these rows I am adding,
>> the longer the execution time gets. Which is expected, but from a certain
>> point it seems to increase by a factor of 2 or more.
>>
>> While asking for 1 or 2 types of findings takes 5 seconds. Adding a 3rd
>> takes 10 seconds. After 5 we are up to 40 seconds. At 10 we are at 2-3
>> minutes and around 15 it takes so long that it is not feasible anymore. I
>> am currently only testing on a fraction of my data (about 1000 patients).
>> This query is a little simplified since there are other properties that I
>> use, e.g. Has_Id, Has_Date_Observed. But these are all datatype properties
>> pointing to literals so I left them out for sake of simplicity.
>>
>> My questions are:
>> 1. Is that a common/expected query?
>> 2. Is there a different way of achieving this without making the query
>> analyze these rdfs:subClassOf* triples.
>> 3. Should I just use the first, general query and then do some
>> post-processing in my code to split it up into columns before passing it to
>> the user?
>>
>> Thanks for any help!
>> Wolfgang
>>
>>
>


 

Re: Converting result rows to columns in Sparql

Posted by Andy Seaborne <an...@apache.org>.
Wolfgang,

I think the problem is that you are creating a partial cross product. 
This leads to two problems : a lot of redundancy in the answer

If you ask and get 2 items and

 >> SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType
 >> WHERE {
 >>    ?pat a ec:Patient .
 >>   OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType .
 >> ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . }
 >>    OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a 
?dysphagiaType
 >> . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . }
 >>    OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType .
 >> ?hypertType rdfs:subClassOf* nci:Hypertension . }
 >> }

(aside - please provide complete queries, inc prefixes.  I often have to 
reformat them to make them readable after email has been involved and I 
use either sparql.org or qparse to do that.  Both read in a query and 
print it out again but need a complete query to parse)

I'll try to simnpify:

SELECT *
{
    ?pat a ec:Patient
    OPTIONAL { ?pat ec:Has_Finding ?X1 .
               ?X1 a ?T1 .
               ?T1 rdfs:subClassOf* :Z1}
    OPTIONAL { ?pat ec:Has_Finding ?X2 .
               ?X2 a ?T2 .
               ?T2 rdfs:subClassOf* :Z2}

...
}

For every X1/T1/Z1, ARQ will try X2/T2/Z2 so the numbers grow worse than 
linearly.

if number of(X1/T1/Z1) is 2 and number of(X2/T2/Z2) is 3, there are 6 
rows and 3 rows have one of the two set of (X1/T1/Z1) values and the 
other 3 have the other set.

If it were 5 and 10, there are 50 rows.

What you can do is use UNION.

SELECT *
{
    ?pat a ec:Patient
    {
    { ?pat ec:Has_Finding ?X1 .
               ?X1 a ?T1 .
               ?T1 rdfs:subClassOf* :Z1}
    UNION { ?pat ec:Has_Finding ?X2 .
               ?X2 a ?T2 .
               ?T2 rdfs:subClassOf* :Z2}
    UNION { ... }

...
}

This will put 2 rows for (X1/T1/Z1), with cols for (X2/T2/Z2) being 
unset, and 3 rows for (X2/T2/Z2) with (X1/T1/Z1) being unset. 5 rows.

At small numbers, not so much difference but if it were 5 and 10, there 
are 15 rows as opposed to 50.

	Andy

On 25/07/13 14:14, Olivier Rossel wrote:
> (My answer does not help very much, but anyway :)
>
> I am not sure the spreadsheet view is ok for very nested data structures
> (that makes too many columns in the end).
>
> I would use a CONSTRCUT to retrieve a graph of
> patient-findings-findingType, then do some post-processing to build a
> facetable data structure.
> cf this example:
> http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/test.html
> coming from that JSON:
> http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/data.json#
>
>
>
> On Thu, Jul 25, 2013 at 11:07 AM, <hu...@aol.com> wrote:
>
>> Hi,
>>
>> I am running into some performance issues and was wondering if I was
>> approaching the problem from the correct angle or if there is something
>> more efficient.
>>
>> I have a property "Has_Finding" which is used to assert things about
>> patients. There are different kinds of findings, e.g. Dyspnea, Dysphagia,
>> Death, Hypertension ... So my data looks this:
>>
>> patient1 ec:Has_Finding Dyspnea1 . Dyspnea1 a nci:Dyspnea_Score_2 .
>> patient1 ec:Has_Finding Dysphagia1 . Dysphagia1 a nci:Dysphagia_Score_1.
>> patient1 ec:Has_Finding Dyspnea2 . Dyspnea2 a nci:Dyspnea_Score_3.
>> patient2 ec:Has_Finding Dyspnea3. Dyspnea3 a nci:Dyspnea_Score_2.
>> patient2 ec:Has_Finding Hypertension1 . Hypertension1 a nci:Hypertension .
>> etc.
>>
>> My users want to know about findings. I am offering a GUI-based query tool
>> and am generating the Sparql queries automatically. The users are unaware
>> of Sparql or anything like that.
>>
>> So I can easily get all the findings with the simple sparql query:
>>
>> SELECT ?pat ?findingType
>> WHERE { ?pat ec:Has_Finding ?finding . ?finding a ?findingType. }
>>
>> The problem is that the user has to figure out what finding a particular
>> result row is talking about by looking at the value of ?findingType and by
>> string comparison (or something unreliable like that) find out that this is
>> actually representing a Dyspnea Score. Or a Dysphagia Score etc. before
>> he/she can work with the value itself. But the next row could already be
>> representing something else. This way of analyzing the result requires that
>> the user has some semantic knowledge about the data.
>>
>> So I would like to make the query return the different types of findings
>> as "columns" instead of "rows".
>>
>> SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType
>> WHERE {
>>    ?pat a ec:Patient .
>>    OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType .
>> ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . }
>>    OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a ?dysphagiaType
>> . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . }
>>    OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType .
>> ?hypertType rdfs:subClassOf* nci:Hypertension . }
>> }
>>
>> Of course there are more than just 3 different finding types. When
>> executing these queries, I noticed that the more of these rows I am adding,
>> the longer the execution time gets. Which is expected, but from a certain
>> point it seems to increase by a factor of 2 or more.
>>
>> While asking for 1 or 2 types of findings takes 5 seconds. Adding a 3rd
>> takes 10 seconds. After 5 we are up to 40 seconds. At 10 we are at 2-3
>> minutes and around 15 it takes so long that it is not feasible anymore. I
>> am currently only testing on a fraction of my data (about 1000 patients).
>> This query is a little simplified since there are other properties that I
>> use, e.g. Has_Id, Has_Date_Observed. But these are all datatype properties
>> pointing to literals so I left them out for sake of simplicity.
>>
>> My questions are:
>> 1. Is that a common/expected query?
>> 2. Is there a different way of achieving this without making the query
>> analyze these rdfs:subClassOf* triples.
>> 3. Should I just use the first, general query and then do some
>> post-processing in my code to split it up into columns before passing it to
>> the user?
>>
>> Thanks for any help!
>> Wolfgang
>>
>>
>


Re: Converting result rows to columns in Sparql

Posted by Olivier Rossel <ol...@gmail.com>.
(My answer does not help very much, but anyway :)

I am not sure the spreadsheet view is ok for very nested data structures
(that makes too many columns in the end).

I would use a CONSTRCUT to retrieve a graph of
patient-findings-findingType, then do some post-processing to build a
facetable data structure.
cf this example:
http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/test.html
coming from that JSON:
http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/data.json#



On Thu, Jul 25, 2013 at 11:07 AM, <hu...@aol.com> wrote:

> Hi,
>
> I am running into some performance issues and was wondering if I was
> approaching the problem from the correct angle or if there is something
> more efficient.
>
> I have a property "Has_Finding" which is used to assert things about
> patients. There are different kinds of findings, e.g. Dyspnea, Dysphagia,
> Death, Hypertension ... So my data looks this:
>
> patient1 ec:Has_Finding Dyspnea1 . Dyspnea1 a nci:Dyspnea_Score_2 .
> patient1 ec:Has_Finding Dysphagia1 . Dysphagia1 a nci:Dysphagia_Score_1.
> patient1 ec:Has_Finding Dyspnea2 . Dyspnea2 a nci:Dyspnea_Score_3.
> patient2 ec:Has_Finding Dyspnea3. Dyspnea3 a nci:Dyspnea_Score_2.
> patient2 ec:Has_Finding Hypertension1 . Hypertension1 a nci:Hypertension .
> etc.
>
> My users want to know about findings. I am offering a GUI-based query tool
> and am generating the Sparql queries automatically. The users are unaware
> of Sparql or anything like that.
>
> So I can easily get all the findings with the simple sparql query:
>
> SELECT ?pat ?findingType
> WHERE { ?pat ec:Has_Finding ?finding . ?finding a ?findingType. }
>
> The problem is that the user has to figure out what finding a particular
> result row is talking about by looking at the value of ?findingType and by
> string comparison (or something unreliable like that) find out that this is
> actually representing a Dyspnea Score. Or a Dysphagia Score etc. before
> he/she can work with the value itself. But the next row could already be
> representing something else. This way of analyzing the result requires that
> the user has some semantic knowledge about the data.
>
> So I would like to make the query return the different types of findings
> as "columns" instead of "rows".
>
> SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType
> WHERE {
>   ?pat a ec:Patient .
>   OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType .
> ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . }
>   OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a ?dysphagiaType
> . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . }
>   OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType .
> ?hypertType rdfs:subClassOf* nci:Hypertension . }
> }
>
> Of course there are more than just 3 different finding types. When
> executing these queries, I noticed that the more of these rows I am adding,
> the longer the execution time gets. Which is expected, but from a certain
> point it seems to increase by a factor of 2 or more.
>
> While asking for 1 or 2 types of findings takes 5 seconds. Adding a 3rd
> takes 10 seconds. After 5 we are up to 40 seconds. At 10 we are at 2-3
> minutes and around 15 it takes so long that it is not feasible anymore. I
> am currently only testing on a fraction of my data (about 1000 patients).
> This query is a little simplified since there are other properties that I
> use, e.g. Has_Id, Has_Date_Observed. But these are all datatype properties
> pointing to literals so I left them out for sake of simplicity.
>
> My questions are:
> 1. Is that a common/expected query?
> 2. Is there a different way of achieving this without making the query
> analyze these rdfs:subClassOf* triples.
> 3. Should I just use the first, general query and then do some
> post-processing in my code to split it up into columns before passing it to
> the user?
>
> Thanks for any help!
> Wolfgang
>
>

Converting result rows to columns in Sparql

Posted by hu...@aol.com.
Hi,

I am running into some performance issues and was wondering if I was approaching the problem from the correct angle or if there is something more efficient.

I have a property "Has_Finding" which is used to assert things about patients. There are different kinds of findings, e.g. Dyspnea, Dysphagia, Death, Hypertension ... So my data looks this:

patient1 ec:Has_Finding Dyspnea1 . Dyspnea1 a nci:Dyspnea_Score_2 .
patient1 ec:Has_Finding Dysphagia1 . Dysphagia1 a nci:Dysphagia_Score_1.
patient1 ec:Has_Finding Dyspnea2 . Dyspnea2 a nci:Dyspnea_Score_3.
patient2 ec:Has_Finding Dyspnea3. Dyspnea3 a nci:Dyspnea_Score_2.
patient2 ec:Has_Finding Hypertension1 . Hypertension1 a nci:Hypertension . 
etc.

My users want to know about findings. I am offering a GUI-based query tool and am generating the Sparql queries automatically. The users are unaware of Sparql or anything like that. 

So I can easily get all the findings with the simple sparql query:

SELECT ?pat ?findingType
WHERE { ?pat ec:Has_Finding ?finding . ?finding a ?findingType. } 

The problem is that the user has to figure out what finding a particular result row is talking about by looking at the value of ?findingType and by string comparison (or something unreliable like that) find out that this is actually representing a Dyspnea Score. Or a Dysphagia Score etc. before he/she can work with the value itself. But the next row could already be representing something else. This way of analyzing the result requires that the user has some semantic knowledge about the data. 

So I would like to make the query return the different types of findings as "columns" instead of "rows".

SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType
WHERE {
  ?pat a ec:Patient .
  OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType . ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . }
  OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a ?dysphagiaType . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . } 
  OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType . ?hypertType rdfs:subClassOf* nci:Hypertension . } 
}

Of course there are more than just 3 different finding types. When executing these queries, I noticed that the more of these rows I am adding, the longer the execution time gets. Which is expected, but from a certain point it seems to increase by a factor of 2 or more.

While asking for 1 or 2 types of findings takes 5 seconds. Adding a 3rd takes 10 seconds. After 5 we are up to 40 seconds. At 10 we are at 2-3 minutes and around 15 it takes so long that it is not feasible anymore. I am currently only testing on a fraction of my data (about 1000 patients). This query is a little simplified since there are other properties that I use, e.g. Has_Id, Has_Date_Observed. But these are all datatype properties pointing to literals so I left them out for sake of simplicity.

My questions are:
1. Is that a common/expected query?
2. Is there a different way of achieving this without making the query analyze these rdfs:subClassOf* triples.
3. Should I just use the first, general query and then do some post-processing in my code to split it up into columns before passing it to the user?

Thanks for any help!
Wolfgang


Re: Jena 2.10.2 with jena-txt

Posted by Andy Seaborne <an...@apache.org>.
On 22/07/13 13:43, hueyl16@aol.com wrote:
> Hi,
>
> I read here
> http://jena.apache.org/documentation/query/text-query.html that a new
> free-text querying module was released with version 2.10.2.
> Unfortunately I can only find version 2.10.1 on the download page
> (http://www.apache.org/dist/jena/binaries/). What is the schedule for
> 2.10.2?
>
> I want to enable and speed up queries that use something like
> "starts-with" or "contains" on string literals (mostly in annotation
> properties). Am I right assuming that I either need jena-text or LARQ
> (if 2.10.2 is not available yet) for that? I am experiencing slow
> performance when using Sparql and regex functions on a triple store
> with about 90.000 classes.
>
> Regards, Wolfgang
>

Jena 2.10.2 has not been released - the documentation escaped from the 
staging website a bit early.

You can use a SNAPSHOT and it's included in the Fuseki snapshot build.

https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-text/1.0.0-SNAPSHOT/

At the moment, SNAPSHOTs are pretty stable - we operate a "trunk is 
releasable" policy unless we have to make some kind of exception (which 
hasn't happened at Apache).

	Andy

Jena 2.10.2 with jena-txt

Posted by hu...@aol.com.
Hi,

I read here http://jena.apache.org/documentation/query/text-query.html that a new free-text querying module was released with version 2.10.2. Unfortunately I can only find version 2.10.1 on the download page (http://www.apache.org/dist/jena/binaries/). What is the schedule for 2.10.2?

I want to enable and speed up queries that use something like "starts-with" or "contains" on string literals (mostly in annotation properties). Am I right assuming that I either need jena-text or LARQ (if 2.10.2 is not available yet) for that? I am experiencing slow performance when using Sparql and regex functions on a triple store with about 90.000 classes. 

Regards,
Wolfgang

RE: ontology

Posted by David Jordan <Da...@sas.com>.
Not to criticize you or anything...
Specifying two things A and B as being the same versus saying A is a subclass of B is an extremely basic modeling notion that can be found in many data modeling contexts (object-oriented, among others). You need to have a good understanding of these very basic modeling concepts before you attempt to define an ontology. This is not something that is strictly in the heads of experts on knowledge bases.

-----Original Message-----
From: Nick Khamis [mailto:symack@gmail.com] 
Sent: Friday, July 19, 2013 11:35 AM
To: users@jena.apache.org
Cc: deepa nagaraj
Subject: Re: ontology

We can't all be knowledge base experts :)



Re: ontology

Posted by Nick Khamis <sy...@gmail.com>.
We can't all be knowledge base experts :)

Re: ontology

Posted by Dave Reynolds <da...@gmail.com>.
On 19/07/13 09:15, deepa nagaraj wrote:
>
> Hi everyone,
>
> my requirement is below:
>
> We have automobile rdf file,  need to generate the triple by using this.
>
> please find the attachment of rdf file.
>
> when i query it with owl:sameAs property i'm unable to retrieve the
> whole hierarchy.
>
> for ex: Sedans is same as both car as well as vehicle (should get
> GMCrossovers, GMSportConvertible etc by using inference ).

This is a very strange ontology. I think you mean all of these relations 
to be owl:subClassOf.

ex:Sedan is a sub class of ex:Car, it should not be "same as".

> but i'm getting only car .

Perhaps you are not using inference. Since you haven't shown (a 
complete, minimal, example of) your code it is hard to tell.

Assuming you are using the OntAPI then make sure you are using 
OntModelSpec.OWL_MEM_MINI_RULE_INF or OntModelSpec.OWL_MEM_RULE_INF

If you fix your ontology and so look at sub class hierarchies then you 
won't need this. You can use the OntAPI to traverse the hierarchy.

If you do need inference I would normally recommend 
OWL_MEM_MICRO_RULE_INF rather than the full rule set unless you truly 
want sameAs reasoning.

Dave


Re: ontology

Posted by Dave Reynolds <da...@gmail.com>.
On 19/07/13 09:15, deepa nagaraj wrote:
>
> Hi everyone,

This is the list for people developing jena. Your question is about 
using it and should be directly to the users list. I'll reply there.

Dave

>
> my requirement is below:
>
> We have automobile rdf file,  need to generate the triple by using this.
>
> please find the attachment of rdf file.
>
> when i query it with owl:sameAs property i'm unable to retrieve the
> whole hierarchy.
>
> for ex: Sedans is same as both car as well as vehicle (should get
> GMCrossovers, GMSportConvertible etc by using inference ).
>
> but i'm getting only car .
>
>
> my doubt is why am i nt getting the result.
>
> but when i run the same file with allegrograph, i'm getting the result.
>
>
> --
> Regards,
>
> Deepa