You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Brad Moran <bm...@pinnacle21.net> on 2013/09/01 01:00:15 UTC

jena text query optimization

Hi,
I am currently having a problem getting the exact results I want from my
text queries. I attached one example of my rdf that I begin with. Then I
run tdbloader and successfully create an index using this assembler file
with jena.textindexer:

@prefix :        <http://localhost/jena_example/#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix mms:     <http://rdf.cdisc.org/mms#> .
@prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
@prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .
@prefix sends: <http://rdf.cdisc.org/send/schema#> .
@prefix sendigs: <http://rdf.cdisc.org/send-3.0/schema#> .
@prefix cts: <http://rdf.cdisc.org/ct/schema#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#indexLucene> ;
    .

# A TDB dataset used for RDF storage
<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location "tdb" ;
    # if from command line use: "NetBeansProjects/mdr-older/trunk/tdb"
    .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:luceneIndexes> ;
    text:entityMap <#entMap> ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate mms:dataElementName ]
         [ text:field "text" ; text:predicate mms:dataElementDescription ]
 [ text:field "text" ; text:predicate mms:dataElementLabel ]
 [ text:field "text" ; text:predicate mms:dataElementType ]
 [ text:field "text" ; text:predicate mms:ordinal ]
 [ text:field "text" ; text:predicate mms:broader ]
         [ text:field "text" ; text:predicate mms:Dataset ]
         [ text:field "text" ; text:predicate mms:contextName ]
         [ text:field "text" ; text:predicate mms:contextLabel ]
         [ text:field "text" ; text:predicate mms:contextDescription ]
[ text:field "text" ; text:predicate sdtms:dataElementType ]
  [ text:field "text" ; text:predicate sdtms:dataElementRole ]
[ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
         [ text:field "text" ; text:predicate sdtms:supportedBySDTMIG ]
         [ text:field "text" ; text:predicate sdtms:supportedBySEND ]
[ text:field "text" ; text:predicate sdtmigs:references ]
         [ text:field "text" ; text:predicate sdtmigs:domainStructure ]
         [ text:field "text" ; text:predicate sdtmigs:domainCode ]
         [ text:field "text" ; text:predicate
sdtmigs:controlledTermsOrFormat ]
         [ text:field "text" ; text:predicate sends:dataElementCompliance ]
         [ text:field "text" ; text:predicate sends:dataElementRole ]
         [ text:field "text" ; text:predicate sendigs:domainStructure ]
         [ text:field "text" ; text:predicate sendigs:domainCode ]
         [ text:field "text" ; text:predicate
sendigs:controlledTermsOrFormat ]
         [ text:field "text" ; text:predicate cts:cdiscDefinition]
         [ text:field "text" ; text:predicate cts:nciPreferredTerm]
         [ text:field "text" ; text:predicate cts:nciCode]
         [ text:field "text" ; text:predicate cts:cdiscSynonyms]
         [ text:field "text" ; text:predicate cts:cdiscSubmissionValue]
         [ text:field "text" ; text:predicate cts:codelistName]
         [ text:field "text" ; text:predicate cts:isExtensibleCodelist]
         ) .


I then try to run queries against this dataset, as an example say I want to
search "AE" then I would expect every dataElement within the AE domain to
be returned. However, I cannot get the desired result. If I search:

PREFIX : <http://localhost/jena_example/#> PREFIX text: <
http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
SELECT * {?s text:query (mms:dataElementName 'AE')}

I get:

<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>

when I would expect to get:

<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
(And the rest of the .AE dataElements just listed a few here)

I also tried playing with this query a lot, but could not get the desired
result for example I tried the other form of query as well:

PREFIX : <http://localhost/jena_example/#> PREFIX text: <
http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
SELECT * {?subject mms:contextName ?o . ?s text:query (mms:contextName
'SE')}


I am not sure whether the problem is a result of my query being formed
incorrectly, or whether the problem could be in my assembler file that
creates the index (is there a better/more complete way to create an index
for this rdf model?). Any suggestions would help, like I mentioned in the
beginning one of the rdf files from tdb is attached. Thanks.

Re: jena text query optimization

Posted by Brad Moran <bm...@pinnacle21.net>.
Ok, this is a sample of several large rdf files I am working with:

<mms:DataElement rdf:ID="Column.AE.AERELNST">
    <mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
    >xsd:string</mms:dataElementType>
    <mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >AERELNST</mms:dataElementName>
    <sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
    <mms:context>
      <mms:Dataset rdf:ID="Table.AE">
        <sdtmigs:domainStructure rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >One record per adverse event per subject</sdtmigs:domainStructure>
        <mms:contextName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >AE</mms:contextName>
        <mms:contextLabel rdf:parseType="Literal">Adverse
Events</mms:contextLabel>
        <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
        >8</mms:ordinal>
        <sdtmigs:domainCode rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >AE</sdtmigs:domainCode>
        <mms:context rdf:resource="#EventsObservationClass"/>
      </mms:Dataset>
    </mms:context>
    <sdtms:dataElementCompliance rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
    <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
    >21</mms:ordinal>
    <sdtmigs:references rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >SDTM 2.2.2</sdtmigs:references>
    <sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
    <mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Relationship to Non-Study Treatment</mms:dataElementLabel>
    <mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Records the investigator's opinion as to whether the event may have
been due to a treatment other than study drug. May be reported as free
text. Example: "MORE LIKELY RELATED TO ASPIRIN
USE.".</mms:dataElementDescription>
    <mms:broader rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/std#DE.Event.--RELNST"/>
  </mms:DataElement>
  <mms:DataElement rdf:ID="Column.SU.SUMODIFY">
    <mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Modified Substance Name</mms:dataElementLabel>
    <sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
    <mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
    >xsd:string</mms:dataElementType>
    <sdtmigs:references rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >SDTM 2.2.1, SDTMIG 4.1.3.6</sdtmigs:references>
    <mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >If SUTRT is modified, then the modified text is placed
here.</mms:dataElementDescription>
    <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
    >8</mms:ordinal>
    <sdtms:dataElementCompliance rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
    <mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >SUMODIFY</mms:dataElementName>
    <mms:context rdf:resource="#Table.SU"/>
    <mms:broader rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/std#DE.Intervention.--MODIFY"/>
    <sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.SynonymQualifier"/>
  </mms:DataElement>
  <mms:DataElement rdf:ID="Column.CO.IDVAR">
    <sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
    <mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >IDVAR</mms:dataElementName>
    <sdtms:dataElementCompliance rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
    <sdtmigs:controlledTermsOrFormat rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >*</sdtmigs:controlledTermsOrFormat>
    <mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Identifying Variable</mms:dataElementLabel>
    <mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Identifying variable in the parent dataset that identifies the
record(s) to which the comment applies. Examples AESEQ or CMGRPID. Used
only when individual comments are related to domain records. Null for
comments collected on separate CRFs.</mms:dataElementDescription>
    <mms:context>
      <mms:Dataset rdf:ID="Table.CO">
        <sdtmigs:domainCode rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >CO</sdtmigs:domainCode>
        <mms:contextName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >CO</mms:contextName>
        <sdtmigs:domainStructure rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >One record per comment per subject</sdtmigs:domainStructure>
        <mms:contextLabel
rdf:parseType="Literal">Comments</mms:contextLabel>
        <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
        >2</mms:ordinal>
        <mms:context rdf:resource="#SpecialPurposeDomain"/>
      </mms:Dataset>
    </mms:context>
    <mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
    >xsd:string</mms:dataElementType>
    <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
    >6</mms:ordinal>
    <sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
  </mms:DataElement>


I then index this data using this assembler file using jena.textindexer:

@prefix :        <http://localhost/jena_example/#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix mms:     <http://rdf.cdisc.org/mms#> .
@prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
@prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#indexLucene> ;
    .

# A TDB dataset used for RDF storage
<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location "tdb" ;
    # if from command line use: "NetBeansProjects/mdr-older/trunk/tdb"
    .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:luceneIndexes> ;
    text:entityMap <#entMap> ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate mms:dataElementName ]
         [ text:field "text" ; text:predicate mms:dataElementDescription ]
 [ text:field "text" ; text:predicate mms:dataElementLabel ]
 [ text:field "text" ; text:predicate mms:dataElementType ]
 [ text:field "text" ; text:predicate mms:ordinal ]
 [ text:field "text" ; text:predicate mms:broader ]
         [ text:field "text" ; text:predicate mms:Dataset ]
         [ text:field "text" ; text:predicate mms:contextName ]
         [ text:field "text" ; text:predicate mms:contextLabel ]
         [ text:field "text" ; text:predicate mms:contextDescription ]
[ text:field "text" ; text:predicate sdtms:dataElementType ]
  [ text:field "text" ; text:predicate sdtms:dataElementRole ]
[ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
         [ text:field "text" ; text:predicate sdtms:supportedBySDTMIG ]
         [ text:field "text" ; text:predicate sdtms:supportedBySEND ]
[ text:field "text" ; text:predicate sdtmigs:references ]
         [ text:field "text" ; text:predicate sdtmigs:domainStructure ]
         [ text:field "text" ; text:predicate sdtmigs:domainCode ]
         [ text:field "text" ; text:predicate
sdtmigs:controlledTermsOrFormat ]
         ) .

Finally I try to run a query on the dataset with the index:

PREFIX : <http://localhost/jena_example/#> PREFIX text: <
http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
SELECT * {?s text:query (mms:dataElementName 'AE')}

I would expect to get the first dataElement: AERELNST. I am unsure as to
whether my problem is in the format of my query or in the format of my
assembler file. Any thoughts?



On Sun, Sep 1, 2013 at 7:43 AM, Andy Seaborne <an...@apache.org> wrote:

> On 01/09/13 00:02, Brad Moran wrote:
>
>> sorry the file type should be saved as .owl
>>
>
> I see no data.  If you had an attachment, then they don't get through to
> the mailing list.
>
> Would it be possible to create a complete, minimal example of your setup?
>  A small amount of data that shows the situation.
> This description is quite long - is it all needed or can you see the same
> issues in a smaller configuration?
>
>         Andy
>
>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Brad Moran <bmoran@pinnacle21.net
>> <ma...@pinnacle21.net>**> wrote:
>>
>>     Hi,
>>     I am currently having a problem getting the exact results I want
>>     from my text queries. I attached one example of my rdf that I begin
>>     with. Then I run tdbloader and successfully create an index using
>>     this assembler file with jena.textindexer:
>>
>>     @prefix :        <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> .
>>     @prefix rdf:     <http://www.w3.org/1999/02/22-**rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>>
>> .
>>     @prefix rdfs:    <http://www.w3.org/2000/01/**rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>>
>> .
>>     @prefix tdb:     <http://jena.hpl.hp.com/2008/**tdb#<http://jena.hpl.hp.com/2008/tdb#>>
>> .
>>     @prefix ja:      <http://jena.hpl.hp.com/2005/**11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>>
>> .
>>     @prefix text:    <http://jena.apache.org/text#> .
>>     @prefix mms:     <http://rdf.cdisc.org/mms#> .
>>     @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-**2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>>
>> .
>>     @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-**3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>>
>> .
>>     @prefix sends: <http://rdf.cdisc.org/send/**schema#<http://rdf.cdisc.org/send/schema#>>
>> .
>>     @prefix sendigs: <http://rdf.cdisc.org/send-3.**0/schema#<http://rdf.cdisc.org/send-3.0/schema#>>
>> .
>>     @prefix cts: <http://rdf.cdisc.org/ct/**schema#<http://rdf.cdisc.org/ct/schema#>>
>> .
>>
>>     ## Example of a TDB dataset and text index
>>     ## Initialize TDB
>>     [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>     tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>     tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>
>>     ## Initialize text query
>>     [] ja:loadClass       "org.apache.jena.query.text.**TextQuery" .
>>     # A TextDataset is a regular dataset with a text index.
>>     text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>     # Lucene index
>>     text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>
>>     ## ------------------------------**------------------------------**
>> ---
>>     ## This URI must be fixed - it's used to assemble the text dataset.
>>
>>     :text_dataset rdf:type     text:TextDataset ;
>>          text:dataset   <#dataset> ;
>>          text:index     <#indexLucene> ;
>>          .
>>
>>     # A TDB dataset used for RDF storage
>>     <#dataset> rdf:type      tdb:DatasetTDB ;
>>          tdb:location "tdb" ;
>>          # if from command line use: "NetBeansProjects/mdr-older/**
>> trunk/tdb"
>>          .
>>
>>     # Text index description
>>     <#indexLucene> a text:TextIndexLucene ;
>>          text:directory <file:luceneIndexes> ;
>>          text:entityMap <#entMap> ;
>>          .
>>
>>     # Mapping in the index
>>     # URI stored in field "uri"
>>     # rdfs:label is mapped to field "text"
>>     <#entMap> a text:EntityMap ;
>>          text:entityField      "uri" ;
>>          text:defaultField     "text" ;
>>          text:map (
>>               [ text:field "text" ; text:predicate mms:dataElementName ]
>>               [ text:field "text" ; text:predicate
>>     mms:dataElementDescription ]
>>       [ text:field "text" ; text:predicate mms:dataElementLabel ]
>>       [ text:field "text" ; text:predicate mms:dataElementType ]
>>       [ text:field "text" ; text:predicate mms:ordinal ]
>>       [ text:field "text" ; text:predicate mms:broader ]
>>               [ text:field "text" ; text:predicate mms:Dataset ]
>>               [ text:field "text" ; text:predicate mms:contextName ]
>>               [ text:field "text" ; text:predicate mms:contextLabel ]
>>               [ text:field "text" ; text:predicate mms:contextDescription
>> ]
>>     [ text:field "text" ; text:predicate sdtms:dataElementType ]
>>     [ text:field "text" ; text:predicate sdtms:dataElementRole ]
>>     [ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
>>               [ text:field "text" ; text:predicate
>> sdtms:supportedBySDTMIG ]
>>               [ text:field "text" ; text:predicate sdtms:supportedBySEND ]
>>     [ text:field "text" ; text:predicate sdtmigs:references ]
>>               [ text:field "text" ; text:predicate
>> sdtmigs:domainStructure ]
>>               [ text:field "text" ; text:predicate sdtmigs:domainCode ]
>>               [ text:field "text" ; text:predicate
>>     sdtmigs:**controlledTermsOrFormat ]
>>               [ text:field "text" ; text:predicate
>>     sends:dataElementCompliance ]
>>               [ text:field "text" ; text:predicate sends:dataElementRole ]
>>               [ text:field "text" ; text:predicate
>> sendigs:domainStructure ]
>>               [ text:field "text" ; text:predicate sendigs:domainCode ]
>>               [ text:field "text" ; text:predicate
>>     sendigs:**controlledTermsOrFormat ]
>>               [ text:field "text" ; text:predicate cts:cdiscDefinition]
>>               [ text:field "text" ; text:predicate cts:nciPreferredTerm]
>>               [ text:field "text" ; text:predicate cts:nciCode]
>>               [ text:field "text" ; text:predicate cts:cdiscSynonyms]
>>               [ text:field "text" ; text:predicate
>> cts:cdiscSubmissionValue]
>>               [ text:field "text" ; text:predicate cts:codelistName]
>>               [ text:field "text" ; text:predicate
>> cts:isExtensibleCodelist]
>>               ) .
>>
>>
>>     I then try to run queries against this dataset, as an example say I
>>     want to search "AE" then I would expect every dataElement within the
>>     AE domain to be returned. However, I cannot get the desired result.
>>     If I search:
>>
>>     PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> PREFIX text:
>>     <http://jena.apache.org/text#> PREFIX mms:
>>     <http://rdf.cdisc.org/mms#> SELECT * {?s text:query
>>     (mms:dataElementName 'AE')}
>>
>>     I get:
>>
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.DOMAIN<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Table.AE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.FA.FACAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>
>> >
>>
>>     when I would expect to get:
>>
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AERELNST<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEENDY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEMODIFY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AETOXGR<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEREFID<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESCAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESEQ<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESMIE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
>> >
>>     (And the rest of the .AE dataElements just listed a few here)
>>
>>     I also tried playing with this query a lot, but could not get the
>>     desired result for example I tried the other form of query as well:
>>
>>     PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> PREFIX text:
>>     <http://jena.apache.org/text#> PREFIX mms:
>>     <http://rdf.cdisc.org/mms#> SELECT * {?subject mms:contextName ?o .
>>     ?s text:query (mms:contextName 'SE')}
>>
>>
>>     I am not sure whether the problem is a result of my query being
>>     formed incorrectly, or whether the problem could be in my assembler
>>     file that creates the index (is there a better/more complete way to
>>     create an index for this rdf model?). Any suggestions would help,
>>     like I mentioned in the beginning one of the rdf files from tdb is
>>     attached. Thanks.
>>
>>
>>
>

Re: jena text query optimization

Posted by Andy Seaborne <an...@apache.org>.
Brad,

'AE' does not match the data at all.  'AE*' does.  Is that the issue?

Try this query to see what's going on:

PREFIX : <http://localhost/jena_example/#>
PREFIX text: <http://jena.apache.org/text#>
PREFIX mms: <http://rdf.cdisc.org/mms#>

SELECT *
   { {?s text:query (mms:dataElementName 'AE*')}
     UNION
     {?s ?p ?t .
      FILTER regex(str(?t), 'AE', 'i')
     }
}

- - - - - - -

It took me quite some while to reconstruct this set up - the RDF is 
broken, the assembler description is broken, the text index construction 
is not minimal, you have said what you see with the reduced data and I 
do not know how you are running the queries.  I don't always have the 
amount of time for each question.

	Andy

On 12/09/13 20:08, Brad Moran wrote:
> Hopefully this is better. Here is a smaller sample of several large rdf
> files I am working with:
>
> <?xml version="1.0"?>
> <rdf:RDF
>      xmlns:mms="http://rdf.cdisc.org/mms#"
>      xmlns:sdtm="http://rdf.cdisc.org/sdtm-1-2/std#"
>      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>      xmlns:skos="http://www.w3.org/2004/02/skos/core#"
>      xmlns:sdtmigs="http://rdf.cdisc.org/sdtmig-3-1-2/schema#"
>      xmlns:owl="http://www.w3.org/2002/07/owl#"
>      xmlns:dc="http://purl.org/dc/elements/1.1/"
>      xmlns="http://rdf.cdisc.org/sdtmig-3-1-2/std#"
>      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>      xmlns:sdtms="http://rdf.cdisc.org/sdtm-1-2/schema#"
>      xmlns:sdtmct="http://rdf.cdisc.org/sdtm/ct#"
>      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>    xml:base="http://rdf.cdisc.org/sdtmig-3-1-2/std">
> <mms:DataElement rdf:ID="Column.AE.AERELNST">
>      <mms:dataElementType rdf:datatype="
> http://www.w3.org/2001/XMLSchema#QName"
>      >xsd:string</mms:dataElementType>
>      <mms:dataElementName rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >AERELNST</mms:dataElementName>
>      <sdtms:dataElementType rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
>      <sdtms:dataElementCompliance rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
>      <mms:ordinal rdf:datatype="
> http://www.w3.org/2001/XMLSchema#positiveInteger"
>      >21</mms:ordinal>
>      <sdtmigs:references rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >SDTM 2.2.2</sdtmigs:references>
>      <sdtms:dataElementRole rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
>      <mms:dataElementLabel rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >Relationship to Non-Study Treatment</mms:dataElementLabel>
>      <mms:dataElementDescription rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >Records the investigator's opinion as to whether the event may have
> been due to a treatment other than study drug. May be reported as free
> text. Example: "MORE LIKELY RELATED TO ASPIRIN
> USE.".</mms:dataElementDescription>
>      <mms:broader rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/std#DE.Event.--RELNST"/>
>    </mms:DataElement>
>
>
>
> I then index this data using this assembler file using jena.textindexer:
>
> @prefix :        <http://localhost/jena_example/#> .
> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text:    <http://jena.apache.org/text#> .
> @prefix mms:     <http://rdf.cdisc.org/mms#> .
> @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
> @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .
>
>           ) .
>
> Finally I try to run a query on the dataset with the index:
>
> PREFIX : <http://localhost/jena_example/#> PREFIX text: <
> http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
> SELECT * {?s text:query (mms:dataElementName 'AE')}
>
> I would expect to get the first dataElement: AERELNST. I am unsure as to
> whether my problem is in the format of my query or in the format of my
> assembler file. Any thoughts?
>
>
> On Wed, Sep 11, 2013 at 6:09 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> Brad,
>>
>> That didn't make it through (and it's truncated).  Do you have a smaller
>> data sample?  Does the data have to be that long to show the issue?  Can
>> you put it up somewhere I can pull it from?
>>
>>          Andy
>>
>>
>> On 11/09/13 05:57, Brad Moran wrote:
>>
>>> Ok, this is a sample of several large rdf files I am working with:
>>>
>>> <?xml version="1.0"?>
>>> <rdf:RDF
>>>       xmlns:mms="http://rdf.cdisc.**org/mms# <http://rdf.cdisc.org/mms#>"
>>>       xmlns:sdtm="http://rdf.cdisc.**org/sdtm-1-2/std#<http://rdf.cdisc.org/sdtm-1-2/std#>
>>> "
>>>       xmlns:rdf="http://www.w3.org/**1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>> "
>>>       xmlns:skos="http://www.w3.org/**2004/02/skos/core#<http://www.w3.org/2004/02/skos/core#>
>>> "
>>>       xmlns:sdtmigs="http://rdf.**cdisc.org/sdtmig-3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>
>>> **"
>>>       xmlns:owl="http://www.w3.org/**2002/07/owl#<http://www.w3.org/2002/07/owl#>
>>> "
>>>       xmlns:dc="http://purl.org/dc/**elements/1.1/<http://purl.org/dc/elements/1.1/>
>>> "
>>>       xmlns="http://rdf.cdisc.org/**sdtmig-3-1-2/std#<http://rdf.cdisc.org/sdtmig-3-1-2/std#>
>>> "
>>>       xmlns:xsd="http://www.w3.org/**2001/XMLSchema#<http://www.w3.org/2001/XMLSchema#>
>>> "
>>>       xmlns:sdtms="http://rdf.cdisc.**org/sdtm-1-2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>
>>> "
>>>       xmlns:sdtmct="http://rdf.**cdisc.org/sdtm/ct#<http://rdf.cdisc.org/sdtm/ct#>
>>> "
>>>       xmlns:rdfs="http://www.w3.org/**2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>
>>> "
>>>     xml:base="http://rdf.cdisc.**org/sdtmig-3-1-2/std<http://rdf.cdisc.org/sdtmig-3-1-2/std>
>>> ">
>>> <mms:DataElement rdf:ID="Column.AE.AERELNST">
>>>       <mms:dataElementType rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#QName<http://www.w3.org/2001/XMLSchema#QName>
>>> "
>>>       >xsd:string</mms:**dataElementType>
>>>       <mms:dataElementName rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >AERELNST</mms:**dataElementName>
>>>       <sdtms:dataElementType rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.Character<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character>
>>> "/>
>>>       <mms:context>
>>>         <mms:Dataset rdf:ID="Table.AE">
>>>           <sdtmigs:domainStructure rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>           >One record per adverse event per subject</sdtmigs:**
>>> domainStructure>
>>>           <mms:contextName rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>           >AE</mms:contextName>
>>>           <mms:contextLabel rdf:parseType="Literal">**Adverse
>>> Events</mms:contextLabel>
>>>           <mms:ordinal rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>>> "
>>>           >8</mms:ordinal>
>>>           <sdtmigs:domainCode rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>           >AE</sdtmigs:domainCode>
>>>           <mms:context rdf:resource="#**EventsObservationClass"/>
>>>         </mms:Dataset>
>>>       </mms:context>
>>>       <sdtms:dataElementCompliance rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**PermissibleVariable<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable>
>>> "/>
>>>       <mms:ordinal rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>>> "
>>>       >21</mms:ordinal>
>>>       <sdtmigs:references rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >SDTM 2.2.2</sdtmigs:references>
>>>       <sdtms:dataElementRole rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**RecordQualifier<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier>
>>> "/>
>>>       <mms:dataElementLabel rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >Relationship to Non-Study Treatment</mms:**dataElementLabel>
>>>       <mms:dataElementDescription rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >Records the investigator's opinion as to whether the event may have
>>> been due to a treatment other than study drug. May be reported as free
>>> text. Example: "MORE LIKELY RELATED TO ASPIRIN
>>> USE.".</mms:**dataElementDescription>
>>>       <mms:broader rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**std#DE.Event.--RELNST<http://rdf.cdisc.org/sdtm-1-2/std#DE.Event.--RELNST>
>>> "/>
>>>     </mms:DataElement>
>>>     <mms:DataElement rdf:ID="Column.SU.SUMODIFY">
>>>       <mms:dataElementLabel rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >Modified Substance Name</mms:dataElementLabel>
>>>       <sdtms:dataElementType rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.Character<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character>
>>> "/>
>>>       <mms:dataElementType rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#QName<http://www.w3.org/2001/XMLSchema#QName>
>>> "
>>>       >xsd:string</mms:**dataElementType>
>>>       <sdtmigs:references rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >SDTM 2.2.1, SDTMIG 4.1.3.6</sdtmigs:references>
>>>       <mms:dataElementDescription rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >If SUTRT is modified, then the modified text is placed
>>> here.</mms:**dataElementDescription>
>>>       <mms:ordinal rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>>> "
>>>       >8</mms:ordinal>
>>>       <sdtms:dataElementCompliance rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**PermissibleVariable<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable>
>>> "/>
>>>       <mms:dataElementName rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >SUMODIFY</mms:**dataElementName>
>>>       <mms:context rdf:resource="#Table.SU"/>
>>>       <mms:broader rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**std#DE.Intervention.--MODIFY<http://rdf.cdisc.org/sdtm-1-2/std#DE.Intervention.--MODIFY>
>>> "/**>
>>>       <sdtms:dataElementRole rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**SynonymQualifier<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.SynonymQualifier>
>>> "/>
>>>     </mms:DataElement>
>>>     <mms:DataElement rdf:ID="Column.CO.IDVAR">
>>>       <sdtms:dataElementRole rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**RecordQualifier<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier>
>>> "/>
>>>       <mms:dataElementName rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >IDVAR</mms:dataElementName>
>>>       <sdtms:dataElementCompliance rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**PermissibleVariable<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable>
>>> "/>
>>>       <sdtmigs:**controlledTermsOrFormat rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >*</sdtmigs:**controlledTermsOrFormat>
>>>       <mms:dataElementLabel rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >Identifying Variable</mms:**dataElementLabel>
>>>       <mms:dataElementDescription rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>       >Identifying variable in the parent dataset that identifies the
>>> record(s) to which the comment applies. Examples AESEQ or CMGRPID. Used
>>> only when individual comments are related to domain records. Null for
>>> comments collected on separate CRFs.</mms:**dataElementDescription>
>>>       <mms:context>
>>>         <mms:Dataset rdf:ID="Table.CO">
>>>           <sdtmigs:domainCode rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>           >CO</sdtmigs:domainCode>
>>>           <mms:contextName rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>           >CO</mms:contextName>
>>>           <sdtmigs:domainStructure rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>>> "
>>>           >One record per comment per subject</sdtmigs:**domainStructure>
>>>           <mms:contextLabel
>>> rdf:parseType="Literal">**Comments</mms:contextLabel>
>>>           <mms:ordinal rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>>> "
>>>           >2</mms:ordinal>
>>>           <mms:context rdf:resource="#**SpecialPurposeDomain"/>
>>>         </mms:Dataset>
>>>       </mms:context>
>>>       <mms:dataElementType rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#QName<http://www.w3.org/2001/XMLSchema#QName>
>>> "
>>>       >xsd:string</mms:**dataElementType>
>>>       <mms:ordinal rdf:datatype="
>>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>>> "
>>>       >6</mms:ordinal>
>>>       <sdtms:dataElementType rdf:resource="
>>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.Character<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character>
>>> "/>
>>>     </mms:DataElement>
>>>
>>>
>>> I then index this data using this assembler file using jena.textindexer:
>>>
>>> @prefix :        <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>>> .
>>> @prefix rdf:     <http://www.w3.org/1999/02/22-**rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>>
>>> .
>>> @prefix rdfs:    <http://www.w3.org/2000/01/**rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>>
>>> .
>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/**tdb#<http://jena.hpl.hp.com/2008/tdb#>>
>>> .
>>> @prefix ja:      <http://jena.hpl.hp.com/2005/**11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>>
>>> .
>>> @prefix text:    <http://jena.apache.org/text#> .
>>> @prefix mms:     <http://rdf.cdisc.org/mms#> .
>>> @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-**2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>>
>>> .
>>> @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-**3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>>
>>> .
>>>
>>>            ) .
>>>
>>> Finally I try to run a query on the dataset with the index:
>>>
>>> PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>>> PREFIX text: <
>>> http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
>>> SELECT * {?s text:query (mms:dataElementName 'AE')}
>>>
>>> I would expect to get the first dataElement: AERELNST. I am unsure as to
>>> whether my problem is in the format of my query or in the format of my
>>> assembler file. Any thoughts?
>>>
>>>
>>> On Sun, Sep 1, 2013 at 7:43 AM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>   On 01/09/13 00:02, Brad Moran wrote:
>>>>
>>>>   sorry the file type should be saved as .owl
>>>>>
>>>>>
>>>> I see no data.  If you had an attachment, then they don't get through to
>>>> the mailing list.
>>>>
>>>> Would it be possible to create a complete, minimal example of your setup?
>>>>    A small amount of data that shows the situation.
>>>> This description is quite long - is it all needed or can you see the same
>>>> issues in a smaller configuration?
>>>>
>>>>           Andy
>>>>
>>>>
>>>>
>>>>> On Sat, Aug 31, 2013 at 7:00 PM, Brad Moran <bmoran@pinnacle21.net
>>>>> <ma...@pinnacle21.net>****> wrote:
>>>>>
>>>>>       Hi,
>>>>>       I am currently having a problem getting the exact results I want
>>>>>       from my text queries. I attached one example of my rdf that I begin
>>>>>       with. Then I run tdbloader and successfully create an index using
>>>>>       this assembler file with jena.textindexer:
>>>>>
>>>>>       @prefix :        <http://localhost/jena_****example/#<http://localhost/jena_**example/#>
>>>>> <http://localhost/**jena_example/# <http://localhost/jena_example/#>>>
>>>>> .
>>>>>       @prefix rdf:     <http://www.w3.org/1999/02/22-****rdf-syntax-ns#<http://www.w3.org/1999/02/22-**rdf-syntax-ns#>
>>>>> <http://www.**w3.org/1999/02/22-rdf-syntax-**ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>>
>>>>> .
>>>>>       @prefix rdfs:    <http://www.w3.org/2000/01/****rdf-schema#<http://www.w3.org/2000/01/**rdf-schema#>
>>>>> <http://www.w3.org/**2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>
>>>>>>>
>>>>> .
>>>>>       @prefix tdb:     <http://jena.hpl.hp.com/2008/****tdb#<http://jena.hpl.hp.com/2008/**tdb#>
>>>>> <http://jena.hpl.hp.com/**2008/tdb# <http://jena.hpl.hp.com/2008/tdb#>
>>>>>>>
>>>>> .
>>>>>       @prefix ja:      <http://jena.hpl.hp.com/2005/****11/Assembler#<http://jena.hpl.hp.com/2005/**11/Assembler#>
>>>>> <http://jena.**hpl.hp.com/2005/11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>>> **>
>>>>>
>>>>> .
>>>>>       @prefix text:    <http://jena.apache.org/text#> .
>>>>>       @prefix mms:     <http://rdf.cdisc.org/mms#> .
>>>>>       @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-****2/schema#<http://rdf.cdisc.org/sdtm-1-**2/schema#>
>>>>> <http://rdf.cdisc.**org/sdtm-1-2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>
>>>>>>>
>>>>> .
>>>>>       @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-****3-1-2/schema#<http://rdf.cdisc.org/sdtmig-**3-1-2/schema#>
>>>>> <http://rdf.**cdisc.org/sdtmig-3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>
>>>>> **>>
>>>>> .
>>>>>       @prefix sends: <http://rdf.cdisc.org/send/****schema#<http://rdf.cdisc.org/send/**schema#>
>>>>> <http://rdf.cdisc.org/**send/schema#<http://rdf.cdisc.org/send/schema#>
>>>>>>>
>>>>> .
>>>>>       @prefix sendigs: <http://rdf.cdisc.org/send-3.****0/schema#<http://rdf.cdisc.org/send-3.**0/schema#>
>>>>> <http://rdf.cdisc.**org/send-3.0/schema#<http://rdf.cdisc.org/send-3.0/schema#>
>>>>>>>
>>>>> .
>>>>>       @prefix cts: <http://rdf.cdisc.org/ct/****schema#<http://rdf.cdisc.org/ct/**schema#>
>>>>> <http://rdf.cdisc.org/**ct/schema# <http://rdf.cdisc.org/ct/schema#>>>
>>>>>
>>>>> .
>>>>>
>>>>>       ## Example of a TDB dataset and text index
>>>>>       ## Initialize TDB
>>>>>       [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>>>       tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>>>>       tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>>>>
>>>>>       ## Initialize text query
>>>>>       [] ja:loadClass       "org.apache.jena.query.text.****TextQuery" .
>>>>>
>>>>>       # A TextDataset is a regular dataset with a text index.
>>>>>       text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>>>       # Lucene index
>>>>>       text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>>>
>>>>>       ## ------------------------------****----------------------------*
>>>>> *--**
>>>>>
>>>>> ---
>>>>>       ## This URI must be fixed - it's used to assemble the text dataset.
>>>>>
>>>>>       :text_dataset rdf:type     text:TextDataset ;
>>>>>            text:dataset   <#dataset> ;
>>>>>            text:index     <#indexLucene> ;
>>>>>            .
>>>>>
>>>>>       # A TDB dataset used for RDF storage
>>>>>       <#dataset> rdf:type      tdb:DatasetTDB ;
>>>>>            tdb:location "tdb" ;
>>>>>            # if from command line use: "NetBeansProjects/mdr-older/**
>>>>>
>>>>> trunk/tdb"
>>>>>            .
>>>>>
>>>>>       # Text index description
>>>>>       <#indexLucene> a text:TextIndexLucene ;
>>>>>            text:directory <file:luceneIndexes> ;
>>>>>            text:entityMap <#entMap> ;
>>>>>            .
>>>>>
>>>>>       # Mapping in the index
>>>>>       # URI stored in field "uri"
>>>>>       # rdfs:label is mapped to field "text"
>>>>>       <#entMap> a text:EntityMap ;
>>>>>            text:entityField      "uri" ;
>>>>>            text:defaultField     "text" ;
>>>>>            text:map (
>>>>>                 [ text:field "text" ; text:predicate mms:dataElementName
>>>>> ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>>       mms:dataElementDescription ]
>>>>>         [ text:field "text" ; text:predicate mms:dataElementLabel ]
>>>>>         [ text:field "text" ; text:predicate mms:dataElementType ]
>>>>>         [ text:field "text" ; text:predicate mms:ordinal ]
>>>>>         [ text:field "text" ; text:predicate mms:broader ]
>>>>>                 [ text:field "text" ; text:predicate mms:Dataset ]
>>>>>                 [ text:field "text" ; text:predicate mms:contextName ]
>>>>>                 [ text:field "text" ; text:predicate mms:contextLabel ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> mms:contextDescription
>>>>> ]
>>>>>       [ text:field "text" ; text:predicate sdtms:dataElementType ]
>>>>>       [ text:field "text" ; text:predicate sdtms:dataElementRole ]
>>>>>       [ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> sdtms:supportedBySDTMIG ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> sdtms:supportedBySEND ]
>>>>>       [ text:field "text" ; text:predicate sdtmigs:references ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> sdtmigs:domainStructure ]
>>>>>                 [ text:field "text" ; text:predicate sdtmigs:domainCode ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>>       sdtmigs:****controlledTermsOrFormat ]
>>>>>
>>>>>                 [ text:field "text" ; text:predicate
>>>>>       sends:dataElementCompliance ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> sends:dataElementRole ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> sendigs:domainStructure ]
>>>>>                 [ text:field "text" ; text:predicate sendigs:domainCode ]
>>>>>                 [ text:field "text" ; text:predicate
>>>>>       sendigs:****controlledTermsOrFormat ]
>>>>>
>>>>>                 [ text:field "text" ; text:predicate cts:cdiscDefinition]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> cts:nciPreferredTerm]
>>>>>                 [ text:field "text" ; text:predicate cts:nciCode]
>>>>>                 [ text:field "text" ; text:predicate cts:cdiscSynonyms]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> cts:cdiscSubmissionValue]
>>>>>                 [ text:field "text" ; text:predicate cts:codelistName]
>>>>>                 [ text:field "text" ; text:predicate
>>>>> cts:isExtensibleCodelist]
>>>>>                 ) .
>>>>>
>>>>>
>>>>>       I then try to run queries against this dataset, as an example say I
>>>>>       want to search "AE" then I would expect every dataElement within
>>>>> the
>>>>>       AE domain to be returned. However, I cannot get the desired result.
>>>>>       If I search:
>>>>>
>>>>>       PREFIX : <http://localhost/jena_****example/#<http://localhost/jena_**example/#>
>>>>> <http://localhost/**jena_example/# <http://localhost/jena_example/#>>>
>>>>>
>>>>> PREFIX text:
>>>>>       <http://jena.apache.org/text#> PREFIX mms:
>>>>>       <http://rdf.cdisc.org/mms#> SELECT * {?s text:query
>>>>>       (mms:dataElementName 'AE')}
>>>>>
>>>>>       I get:
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.DOMAIN<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.DOMAIN>
>>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.DOMAIN<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Table.AE<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Table.AE>
>>>>> <http://**rdf.cdisc.org/sdtmig-3-1-2/**std#Table.AE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.FA.FACAT<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.FA.FACAT>
>>>>> <htt**p://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.FA.FACAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>
>>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>       when I would expect to get:
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AERELNST<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AERELNST>
>>>>> <**http://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AERELNST<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AEENDY<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEENDY>
>>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AEENDY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AEMODIFY<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEMODIFY>
>>>>> <**http://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AEMODIFY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AETOXGR<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AETOXGR>
>>>>> <h**ttp://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AETOXGR<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AEREFID<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEREFID>
>>>>> <h**ttp://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AEREFID<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AESCAT<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESCAT>
>>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AESCAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AESEQ<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESEQ>
>>>>> <htt**p://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AESEQ<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
>>>>>>
>>>>>
>>>>>>
>>>>>>        <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AESMIE<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESMIE>
>>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AESMIE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
>>>>>>
>>>>>
>>>>>
>>>>>>        (And the rest of the .AE dataElements just listed a few here)
>>>>>
>>>>>       I also tried playing with this query a lot, but could not get the
>>>>>       desired result for example I tried the other form of query as well:
>>>>>
>>>>>       PREFIX : <http://localhost/jena_****example/#<http://localhost/jena_**example/#>
>>>>> <http://localhost/**jena_example/# <http://localhost/jena_example/#>>>
>>>>>
>>>>> PREFIX text:
>>>>>       <http://jena.apache.org/text#> PREFIX mms:
>>>>>       <http://rdf.cdisc.org/mms#> SELECT * {?subject mms:contextName ?o
>>>>> .
>>>>>       ?s text:query (mms:contextName 'SE')}
>>>>>
>>>>>
>>>>>       I am not sure whether the problem is a result of my query being
>>>>>       formed incorrectly, or whether the problem could be in my assembler
>>>>>       file that creates the index (is there a better/more complete way to
>>>>>       create an index for this rdf model?). Any suggestions would help,
>>>>>       like I mentioned in the beginning one of the rdf files from tdb is
>>>>>       attached. Thanks.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: jena text query optimization

Posted by Brad Moran <bm...@pinnacle21.net>.
Hopefully this is better. Here is a smaller sample of several large rdf
files I am working with:

<?xml version="1.0"?>
<rdf:RDF
    xmlns:mms="http://rdf.cdisc.org/mms#"
    xmlns:sdtm="http://rdf.cdisc.org/sdtm-1-2/std#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:skos="http://www.w3.org/2004/02/skos/core#"
    xmlns:sdtmigs="http://rdf.cdisc.org/sdtmig-3-1-2/schema#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns="http://rdf.cdisc.org/sdtmig-3-1-2/std#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:sdtms="http://rdf.cdisc.org/sdtm-1-2/schema#"
    xmlns:sdtmct="http://rdf.cdisc.org/sdtm/ct#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xml:base="http://rdf.cdisc.org/sdtmig-3-1-2/std">
<mms:DataElement rdf:ID="Column.AE.AERELNST">
    <mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
    >xsd:string</mms:dataElementType>
    <mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >AERELNST</mms:dataElementName>
    <sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
    <sdtms:dataElementCompliance rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
    <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
    >21</mms:ordinal>
    <sdtmigs:references rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >SDTM 2.2.2</sdtmigs:references>
    <sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
    <mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Relationship to Non-Study Treatment</mms:dataElementLabel>
    <mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Records the investigator's opinion as to whether the event may have
been due to a treatment other than study drug. May be reported as free
text. Example: "MORE LIKELY RELATED TO ASPIRIN
USE.".</mms:dataElementDescription>
    <mms:broader rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/std#DE.Event.--RELNST"/>
  </mms:DataElement>



I then index this data using this assembler file using jena.textindexer:

@prefix :        <http://localhost/jena_example/#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix mms:     <http://rdf.cdisc.org/mms#> .
@prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
@prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .

         ) .

Finally I try to run a query on the dataset with the index:

PREFIX : <http://localhost/jena_example/#> PREFIX text: <
http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
SELECT * {?s text:query (mms:dataElementName 'AE')}

I would expect to get the first dataElement: AERELNST. I am unsure as to
whether my problem is in the format of my query or in the format of my
assembler file. Any thoughts?


On Wed, Sep 11, 2013 at 6:09 PM, Andy Seaborne <an...@apache.org> wrote:

> Brad,
>
> That didn't make it through (and it's truncated).  Do you have a smaller
> data sample?  Does the data have to be that long to show the issue?  Can
> you put it up somewhere I can pull it from?
>
>         Andy
>
>
> On 11/09/13 05:57, Brad Moran wrote:
>
>> Ok, this is a sample of several large rdf files I am working with:
>>
>> <?xml version="1.0"?>
>> <rdf:RDF
>>      xmlns:mms="http://rdf.cdisc.**org/mms# <http://rdf.cdisc.org/mms#>"
>>      xmlns:sdtm="http://rdf.cdisc.**org/sdtm-1-2/std#<http://rdf.cdisc.org/sdtm-1-2/std#>
>> "
>>      xmlns:rdf="http://www.w3.org/**1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> "
>>      xmlns:skos="http://www.w3.org/**2004/02/skos/core#<http://www.w3.org/2004/02/skos/core#>
>> "
>>      xmlns:sdtmigs="http://rdf.**cdisc.org/sdtmig-3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>
>> **"
>>      xmlns:owl="http://www.w3.org/**2002/07/owl#<http://www.w3.org/2002/07/owl#>
>> "
>>      xmlns:dc="http://purl.org/dc/**elements/1.1/<http://purl.org/dc/elements/1.1/>
>> "
>>      xmlns="http://rdf.cdisc.org/**sdtmig-3-1-2/std#<http://rdf.cdisc.org/sdtmig-3-1-2/std#>
>> "
>>      xmlns:xsd="http://www.w3.org/**2001/XMLSchema#<http://www.w3.org/2001/XMLSchema#>
>> "
>>      xmlns:sdtms="http://rdf.cdisc.**org/sdtm-1-2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>
>> "
>>      xmlns:sdtmct="http://rdf.**cdisc.org/sdtm/ct#<http://rdf.cdisc.org/sdtm/ct#>
>> "
>>      xmlns:rdfs="http://www.w3.org/**2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>
>> "
>>    xml:base="http://rdf.cdisc.**org/sdtmig-3-1-2/std<http://rdf.cdisc.org/sdtmig-3-1-2/std>
>> ">
>> <mms:DataElement rdf:ID="Column.AE.AERELNST">
>>      <mms:dataElementType rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#QName<http://www.w3.org/2001/XMLSchema#QName>
>> "
>>      >xsd:string</mms:**dataElementType>
>>      <mms:dataElementName rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >AERELNST</mms:**dataElementName>
>>      <sdtms:dataElementType rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.Character<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character>
>> "/>
>>      <mms:context>
>>        <mms:Dataset rdf:ID="Table.AE">
>>          <sdtmigs:domainStructure rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>          >One record per adverse event per subject</sdtmigs:**
>> domainStructure>
>>          <mms:contextName rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>          >AE</mms:contextName>
>>          <mms:contextLabel rdf:parseType="Literal">**Adverse
>> Events</mms:contextLabel>
>>          <mms:ordinal rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>> "
>>          >8</mms:ordinal>
>>          <sdtmigs:domainCode rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>          >AE</sdtmigs:domainCode>
>>          <mms:context rdf:resource="#**EventsObservationClass"/>
>>        </mms:Dataset>
>>      </mms:context>
>>      <sdtms:dataElementCompliance rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**PermissibleVariable<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable>
>> "/>
>>      <mms:ordinal rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>> "
>>      >21</mms:ordinal>
>>      <sdtmigs:references rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >SDTM 2.2.2</sdtmigs:references>
>>      <sdtms:dataElementRole rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**RecordQualifier<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier>
>> "/>
>>      <mms:dataElementLabel rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >Relationship to Non-Study Treatment</mms:**dataElementLabel>
>>      <mms:dataElementDescription rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >Records the investigator's opinion as to whether the event may have
>> been due to a treatment other than study drug. May be reported as free
>> text. Example: "MORE LIKELY RELATED TO ASPIRIN
>> USE.".</mms:**dataElementDescription>
>>      <mms:broader rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**std#DE.Event.--RELNST<http://rdf.cdisc.org/sdtm-1-2/std#DE.Event.--RELNST>
>> "/>
>>    </mms:DataElement>
>>    <mms:DataElement rdf:ID="Column.SU.SUMODIFY">
>>      <mms:dataElementLabel rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >Modified Substance Name</mms:dataElementLabel>
>>      <sdtms:dataElementType rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.Character<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character>
>> "/>
>>      <mms:dataElementType rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#QName<http://www.w3.org/2001/XMLSchema#QName>
>> "
>>      >xsd:string</mms:**dataElementType>
>>      <sdtmigs:references rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >SDTM 2.2.1, SDTMIG 4.1.3.6</sdtmigs:references>
>>      <mms:dataElementDescription rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >If SUTRT is modified, then the modified text is placed
>> here.</mms:**dataElementDescription>
>>      <mms:ordinal rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>> "
>>      >8</mms:ordinal>
>>      <sdtms:dataElementCompliance rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**PermissibleVariable<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable>
>> "/>
>>      <mms:dataElementName rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >SUMODIFY</mms:**dataElementName>
>>      <mms:context rdf:resource="#Table.SU"/>
>>      <mms:broader rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**std#DE.Intervention.--MODIFY<http://rdf.cdisc.org/sdtm-1-2/std#DE.Intervention.--MODIFY>
>> "/**>
>>      <sdtms:dataElementRole rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**SynonymQualifier<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.SynonymQualifier>
>> "/>
>>    </mms:DataElement>
>>    <mms:DataElement rdf:ID="Column.CO.IDVAR">
>>      <sdtms:dataElementRole rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**RecordQualifier<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier>
>> "/>
>>      <mms:dataElementName rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >IDVAR</mms:dataElementName>
>>      <sdtms:dataElementCompliance rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.**PermissibleVariable<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable>
>> "/>
>>      <sdtmigs:**controlledTermsOrFormat rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >*</sdtmigs:**controlledTermsOrFormat>
>>      <mms:dataElementLabel rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >Identifying Variable</mms:**dataElementLabel>
>>      <mms:dataElementDescription rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>      >Identifying variable in the parent dataset that identifies the
>> record(s) to which the comment applies. Examples AESEQ or CMGRPID. Used
>> only when individual comments are related to domain records. Null for
>> comments collected on separate CRFs.</mms:**dataElementDescription>
>>      <mms:context>
>>        <mms:Dataset rdf:ID="Table.CO">
>>          <sdtmigs:domainCode rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>          >CO</sdtmigs:domainCode>
>>          <mms:contextName rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>          >CO</mms:contextName>
>>          <sdtmigs:domainStructure rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#string<http://www.w3.org/2001/XMLSchema#string>
>> "
>>          >One record per comment per subject</sdtmigs:**domainStructure>
>>          <mms:contextLabel
>> rdf:parseType="Literal">**Comments</mms:contextLabel>
>>          <mms:ordinal rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>> "
>>          >2</mms:ordinal>
>>          <mms:context rdf:resource="#**SpecialPurposeDomain"/>
>>        </mms:Dataset>
>>      </mms:context>
>>      <mms:dataElementType rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#QName<http://www.w3.org/2001/XMLSchema#QName>
>> "
>>      >xsd:string</mms:**dataElementType>
>>      <mms:ordinal rdf:datatype="
>> http://www.w3.org/2001/**XMLSchema#positiveInteger<http://www.w3.org/2001/XMLSchema#positiveInteger>
>> "
>>      >6</mms:ordinal>
>>      <sdtms:dataElementType rdf:resource="
>> http://rdf.cdisc.org/sdtm-1-2/**schema#Classifier.Character<http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character>
>> "/>
>>    </mms:DataElement>
>>
>>
>> I then index this data using this assembler file using jena.textindexer:
>>
>> @prefix :        <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> .
>> @prefix rdf:     <http://www.w3.org/1999/02/22-**rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>>
>> .
>> @prefix rdfs:    <http://www.w3.org/2000/01/**rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>>
>> .
>> @prefix tdb:     <http://jena.hpl.hp.com/2008/**tdb#<http://jena.hpl.hp.com/2008/tdb#>>
>> .
>> @prefix ja:      <http://jena.hpl.hp.com/2005/**11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>>
>> .
>> @prefix text:    <http://jena.apache.org/text#> .
>> @prefix mms:     <http://rdf.cdisc.org/mms#> .
>> @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-**2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>>
>> .
>> @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-**3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>>
>> .
>>
>>           ) .
>>
>> Finally I try to run a query on the dataset with the index:
>>
>> PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> PREFIX text: <
>> http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
>> SELECT * {?s text:query (mms:dataElementName 'AE')}
>>
>> I would expect to get the first dataElement: AERELNST. I am unsure as to
>> whether my problem is in the format of my query or in the format of my
>> assembler file. Any thoughts?
>>
>>
>> On Sun, Sep 1, 2013 at 7:43 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>>  On 01/09/13 00:02, Brad Moran wrote:
>>>
>>>  sorry the file type should be saved as .owl
>>>>
>>>>
>>> I see no data.  If you had an attachment, then they don't get through to
>>> the mailing list.
>>>
>>> Would it be possible to create a complete, minimal example of your setup?
>>>   A small amount of data that shows the situation.
>>> This description is quite long - is it all needed or can you see the same
>>> issues in a smaller configuration?
>>>
>>>          Andy
>>>
>>>
>>>
>>>> On Sat, Aug 31, 2013 at 7:00 PM, Brad Moran <bmoran@pinnacle21.net
>>>> <ma...@pinnacle21.net>****> wrote:
>>>>
>>>>      Hi,
>>>>      I am currently having a problem getting the exact results I want
>>>>      from my text queries. I attached one example of my rdf that I begin
>>>>      with. Then I run tdbloader and successfully create an index using
>>>>      this assembler file with jena.textindexer:
>>>>
>>>>      @prefix :        <http://localhost/jena_****example/#<http://localhost/jena_**example/#>
>>>> <http://localhost/**jena_example/# <http://localhost/jena_example/#>>>
>>>> .
>>>>      @prefix rdf:     <http://www.w3.org/1999/02/22-****rdf-syntax-ns#<http://www.w3.org/1999/02/22-**rdf-syntax-ns#>
>>>> <http://www.**w3.org/1999/02/22-rdf-syntax-**ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> >>
>>>> .
>>>>      @prefix rdfs:    <http://www.w3.org/2000/01/****rdf-schema#<http://www.w3.org/2000/01/**rdf-schema#>
>>>> <http://www.w3.org/**2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>
>>>> >>
>>>> .
>>>>      @prefix tdb:     <http://jena.hpl.hp.com/2008/****tdb#<http://jena.hpl.hp.com/2008/**tdb#>
>>>> <http://jena.hpl.hp.com/**2008/tdb# <http://jena.hpl.hp.com/2008/tdb#>
>>>> >>
>>>> .
>>>>      @prefix ja:      <http://jena.hpl.hp.com/2005/****11/Assembler#<http://jena.hpl.hp.com/2005/**11/Assembler#>
>>>> <http://jena.**hpl.hp.com/2005/11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>
>>>> >**>
>>>>
>>>> .
>>>>      @prefix text:    <http://jena.apache.org/text#> .
>>>>      @prefix mms:     <http://rdf.cdisc.org/mms#> .
>>>>      @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-****2/schema#<http://rdf.cdisc.org/sdtm-1-**2/schema#>
>>>> <http://rdf.cdisc.**org/sdtm-1-2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>
>>>> >>
>>>> .
>>>>      @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-****3-1-2/schema#<http://rdf.cdisc.org/sdtmig-**3-1-2/schema#>
>>>> <http://rdf.**cdisc.org/sdtmig-3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>
>>>> **>>
>>>> .
>>>>      @prefix sends: <http://rdf.cdisc.org/send/****schema#<http://rdf.cdisc.org/send/**schema#>
>>>> <http://rdf.cdisc.org/**send/schema#<http://rdf.cdisc.org/send/schema#>
>>>> >>
>>>> .
>>>>      @prefix sendigs: <http://rdf.cdisc.org/send-3.****0/schema#<http://rdf.cdisc.org/send-3.**0/schema#>
>>>> <http://rdf.cdisc.**org/send-3.0/schema#<http://rdf.cdisc.org/send-3.0/schema#>
>>>> >>
>>>> .
>>>>      @prefix cts: <http://rdf.cdisc.org/ct/****schema#<http://rdf.cdisc.org/ct/**schema#>
>>>> <http://rdf.cdisc.org/**ct/schema# <http://rdf.cdisc.org/ct/schema#>>>
>>>>
>>>> .
>>>>
>>>>      ## Example of a TDB dataset and text index
>>>>      ## Initialize TDB
>>>>      [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>>      tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>>>      tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>>>
>>>>      ## Initialize text query
>>>>      [] ja:loadClass       "org.apache.jena.query.text.****TextQuery" .
>>>>
>>>>      # A TextDataset is a regular dataset with a text index.
>>>>      text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>>      # Lucene index
>>>>      text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>>
>>>>      ## ------------------------------****----------------------------*
>>>> *--**
>>>>
>>>> ---
>>>>      ## This URI must be fixed - it's used to assemble the text dataset.
>>>>
>>>>      :text_dataset rdf:type     text:TextDataset ;
>>>>           text:dataset   <#dataset> ;
>>>>           text:index     <#indexLucene> ;
>>>>           .
>>>>
>>>>      # A TDB dataset used for RDF storage
>>>>      <#dataset> rdf:type      tdb:DatasetTDB ;
>>>>           tdb:location "tdb" ;
>>>>           # if from command line use: "NetBeansProjects/mdr-older/**
>>>>
>>>> trunk/tdb"
>>>>           .
>>>>
>>>>      # Text index description
>>>>      <#indexLucene> a text:TextIndexLucene ;
>>>>           text:directory <file:luceneIndexes> ;
>>>>           text:entityMap <#entMap> ;
>>>>           .
>>>>
>>>>      # Mapping in the index
>>>>      # URI stored in field "uri"
>>>>      # rdfs:label is mapped to field "text"
>>>>      <#entMap> a text:EntityMap ;
>>>>           text:entityField      "uri" ;
>>>>           text:defaultField     "text" ;
>>>>           text:map (
>>>>                [ text:field "text" ; text:predicate mms:dataElementName
>>>> ]
>>>>                [ text:field "text" ; text:predicate
>>>>      mms:dataElementDescription ]
>>>>        [ text:field "text" ; text:predicate mms:dataElementLabel ]
>>>>        [ text:field "text" ; text:predicate mms:dataElementType ]
>>>>        [ text:field "text" ; text:predicate mms:ordinal ]
>>>>        [ text:field "text" ; text:predicate mms:broader ]
>>>>                [ text:field "text" ; text:predicate mms:Dataset ]
>>>>                [ text:field "text" ; text:predicate mms:contextName ]
>>>>                [ text:field "text" ; text:predicate mms:contextLabel ]
>>>>                [ text:field "text" ; text:predicate
>>>> mms:contextDescription
>>>> ]
>>>>      [ text:field "text" ; text:predicate sdtms:dataElementType ]
>>>>      [ text:field "text" ; text:predicate sdtms:dataElementRole ]
>>>>      [ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
>>>>                [ text:field "text" ; text:predicate
>>>> sdtms:supportedBySDTMIG ]
>>>>                [ text:field "text" ; text:predicate
>>>> sdtms:supportedBySEND ]
>>>>      [ text:field "text" ; text:predicate sdtmigs:references ]
>>>>                [ text:field "text" ; text:predicate
>>>> sdtmigs:domainStructure ]
>>>>                [ text:field "text" ; text:predicate sdtmigs:domainCode ]
>>>>                [ text:field "text" ; text:predicate
>>>>      sdtmigs:****controlledTermsOrFormat ]
>>>>
>>>>                [ text:field "text" ; text:predicate
>>>>      sends:dataElementCompliance ]
>>>>                [ text:field "text" ; text:predicate
>>>> sends:dataElementRole ]
>>>>                [ text:field "text" ; text:predicate
>>>> sendigs:domainStructure ]
>>>>                [ text:field "text" ; text:predicate sendigs:domainCode ]
>>>>                [ text:field "text" ; text:predicate
>>>>      sendigs:****controlledTermsOrFormat ]
>>>>
>>>>                [ text:field "text" ; text:predicate cts:cdiscDefinition]
>>>>                [ text:field "text" ; text:predicate
>>>> cts:nciPreferredTerm]
>>>>                [ text:field "text" ; text:predicate cts:nciCode]
>>>>                [ text:field "text" ; text:predicate cts:cdiscSynonyms]
>>>>                [ text:field "text" ; text:predicate
>>>> cts:cdiscSubmissionValue]
>>>>                [ text:field "text" ; text:predicate cts:codelistName]
>>>>                [ text:field "text" ; text:predicate
>>>> cts:isExtensibleCodelist]
>>>>                ) .
>>>>
>>>>
>>>>      I then try to run queries against this dataset, as an example say I
>>>>      want to search "AE" then I would expect every dataElement within
>>>> the
>>>>      AE domain to be returned. However, I cannot get the desired result.
>>>>      If I search:
>>>>
>>>>      PREFIX : <http://localhost/jena_****example/#<http://localhost/jena_**example/#>
>>>> <http://localhost/**jena_example/# <http://localhost/jena_example/#>>>
>>>>
>>>> PREFIX text:
>>>>      <http://jena.apache.org/text#> PREFIX mms:
>>>>      <http://rdf.cdisc.org/mms#> SELECT * {?s text:query
>>>>      (mms:dataElementName 'AE')}
>>>>
>>>>      I get:
>>>>
>>>>      <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.DOMAIN<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.DOMAIN>
>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.DOMAIN<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Table.AE<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Table.AE>
>>>> <http://**rdf.cdisc.org/sdtmig-3-1-2/**std#Table.AE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.FA.FACAT<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.FA.FACAT>
>>>> <htt**p://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.FA.FACAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>
>>>> >
>>>>
>>>>
>>>>>
>>>>      when I would expect to get:
>>>>
>>>>      <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AERELNST<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AERELNST>
>>>> <**http://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AERELNST<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AEENDY<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEENDY>
>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AEENDY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AEMODIFY<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEMODIFY>
>>>> <**http://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AEMODIFY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AETOXGR<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AETOXGR>
>>>> <h**ttp://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AETOXGR<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AEREFID<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEREFID>
>>>> <h**ttp://rdf.cdisc.org/sdtmig-3-**1-2/std#Column.AE.AEREFID<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AESCAT<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESCAT>
>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AESCAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AESEQ<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESEQ>
>>>> <htt**p://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AESEQ<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
>>>> >
>>>>
>>>>>
>>>>>       <http://rdf.cdisc.org/sdtmig-****3-1-2/std#Column.AE.AESMIE<http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESMIE>
>>>> <ht**tp://rdf.cdisc.org/sdtmig-3-1-**2/std#Column.AE.AESMIE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
>>>> >
>>>>
>>>>
>>>>>       (And the rest of the .AE dataElements just listed a few here)
>>>>
>>>>      I also tried playing with this query a lot, but could not get the
>>>>      desired result for example I tried the other form of query as well:
>>>>
>>>>      PREFIX : <http://localhost/jena_****example/#<http://localhost/jena_**example/#>
>>>> <http://localhost/**jena_example/# <http://localhost/jena_example/#>>>
>>>>
>>>> PREFIX text:
>>>>      <http://jena.apache.org/text#> PREFIX mms:
>>>>      <http://rdf.cdisc.org/mms#> SELECT * {?subject mms:contextName ?o
>>>> .
>>>>      ?s text:query (mms:contextName 'SE')}
>>>>
>>>>
>>>>      I am not sure whether the problem is a result of my query being
>>>>      formed incorrectly, or whether the problem could be in my assembler
>>>>      file that creates the index (is there a better/more complete way to
>>>>      create an index for this rdf model?). Any suggestions would help,
>>>>      like I mentioned in the beginning one of the rdf files from tdb is
>>>>      attached. Thanks.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: jena text query optimization

Posted by Andy Seaborne <an...@apache.org>.
Brad,

That didn't make it through (and it's truncated).  Do you have a smaller 
data sample?  Does the data have to be that long to show the issue?  Can 
you put it up somewhere I can pull it from?

	Andy

On 11/09/13 05:57, Brad Moran wrote:
> Ok, this is a sample of several large rdf files I am working with:
>
> <?xml version="1.0"?>
> <rdf:RDF
>      xmlns:mms="http://rdf.cdisc.org/mms#"
>      xmlns:sdtm="http://rdf.cdisc.org/sdtm-1-2/std#"
>      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>      xmlns:skos="http://www.w3.org/2004/02/skos/core#"
>      xmlns:sdtmigs="http://rdf.cdisc.org/sdtmig-3-1-2/schema#"
>      xmlns:owl="http://www.w3.org/2002/07/owl#"
>      xmlns:dc="http://purl.org/dc/elements/1.1/"
>      xmlns="http://rdf.cdisc.org/sdtmig-3-1-2/std#"
>      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>      xmlns:sdtms="http://rdf.cdisc.org/sdtm-1-2/schema#"
>      xmlns:sdtmct="http://rdf.cdisc.org/sdtm/ct#"
>      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>    xml:base="http://rdf.cdisc.org/sdtmig-3-1-2/std">
> <mms:DataElement rdf:ID="Column.AE.AERELNST">
>      <mms:dataElementType rdf:datatype="
> http://www.w3.org/2001/XMLSchema#QName"
>      >xsd:string</mms:dataElementType>
>      <mms:dataElementName rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >AERELNST</mms:dataElementName>
>      <sdtms:dataElementType rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
>      <mms:context>
>        <mms:Dataset rdf:ID="Table.AE">
>          <sdtmigs:domainStructure rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>          >One record per adverse event per subject</sdtmigs:domainStructure>
>          <mms:contextName rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>          >AE</mms:contextName>
>          <mms:contextLabel rdf:parseType="Literal">Adverse
> Events</mms:contextLabel>
>          <mms:ordinal rdf:datatype="
> http://www.w3.org/2001/XMLSchema#positiveInteger"
>          >8</mms:ordinal>
>          <sdtmigs:domainCode rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>          >AE</sdtmigs:domainCode>
>          <mms:context rdf:resource="#EventsObservationClass"/>
>        </mms:Dataset>
>      </mms:context>
>      <sdtms:dataElementCompliance rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
>      <mms:ordinal rdf:datatype="
> http://www.w3.org/2001/XMLSchema#positiveInteger"
>      >21</mms:ordinal>
>      <sdtmigs:references rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >SDTM 2.2.2</sdtmigs:references>
>      <sdtms:dataElementRole rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
>      <mms:dataElementLabel rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >Relationship to Non-Study Treatment</mms:dataElementLabel>
>      <mms:dataElementDescription rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >Records the investigator's opinion as to whether the event may have
> been due to a treatment other than study drug. May be reported as free
> text. Example: "MORE LIKELY RELATED TO ASPIRIN
> USE.".</mms:dataElementDescription>
>      <mms:broader rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/std#DE.Event.--RELNST"/>
>    </mms:DataElement>
>    <mms:DataElement rdf:ID="Column.SU.SUMODIFY">
>      <mms:dataElementLabel rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >Modified Substance Name</mms:dataElementLabel>
>      <sdtms:dataElementType rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
>      <mms:dataElementType rdf:datatype="
> http://www.w3.org/2001/XMLSchema#QName"
>      >xsd:string</mms:dataElementType>
>      <sdtmigs:references rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >SDTM 2.2.1, SDTMIG 4.1.3.6</sdtmigs:references>
>      <mms:dataElementDescription rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >If SUTRT is modified, then the modified text is placed
> here.</mms:dataElementDescription>
>      <mms:ordinal rdf:datatype="
> http://www.w3.org/2001/XMLSchema#positiveInteger"
>      >8</mms:ordinal>
>      <sdtms:dataElementCompliance rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
>      <mms:dataElementName rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >SUMODIFY</mms:dataElementName>
>      <mms:context rdf:resource="#Table.SU"/>
>      <mms:broader rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/std#DE.Intervention.--MODIFY"/>
>      <sdtms:dataElementRole rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.SynonymQualifier"/>
>    </mms:DataElement>
>    <mms:DataElement rdf:ID="Column.CO.IDVAR">
>      <sdtms:dataElementRole rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
>      <mms:dataElementName rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >IDVAR</mms:dataElementName>
>      <sdtms:dataElementCompliance rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
>      <sdtmigs:controlledTermsOrFormat rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >*</sdtmigs:controlledTermsOrFormat>
>      <mms:dataElementLabel rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >Identifying Variable</mms:dataElementLabel>
>      <mms:dataElementDescription rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>      >Identifying variable in the parent dataset that identifies the
> record(s) to which the comment applies. Examples AESEQ or CMGRPID. Used
> only when individual comments are related to domain records. Null for
> comments collected on separate CRFs.</mms:dataElementDescription>
>      <mms:context>
>        <mms:Dataset rdf:ID="Table.CO">
>          <sdtmigs:domainCode rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>          >CO</sdtmigs:domainCode>
>          <mms:contextName rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>          >CO</mms:contextName>
>          <sdtmigs:domainStructure rdf:datatype="
> http://www.w3.org/2001/XMLSchema#string"
>          >One record per comment per subject</sdtmigs:domainStructure>
>          <mms:contextLabel
> rdf:parseType="Literal">Comments</mms:contextLabel>
>          <mms:ordinal rdf:datatype="
> http://www.w3.org/2001/XMLSchema#positiveInteger"
>          >2</mms:ordinal>
>          <mms:context rdf:resource="#SpecialPurposeDomain"/>
>        </mms:Dataset>
>      </mms:context>
>      <mms:dataElementType rdf:datatype="
> http://www.w3.org/2001/XMLSchema#QName"
>      >xsd:string</mms:dataElementType>
>      <mms:ordinal rdf:datatype="
> http://www.w3.org/2001/XMLSchema#positiveInteger"
>      >6</mms:ordinal>
>      <sdtms:dataElementType rdf:resource="
> http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
>    </mms:DataElement>
>
>
> I then index this data using this assembler file using jena.textindexer:
>
> @prefix :        <http://localhost/jena_example/#> .
> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text:    <http://jena.apache.org/text#> .
> @prefix mms:     <http://rdf.cdisc.org/mms#> .
> @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
> @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .
>
>           ) .
>
> Finally I try to run a query on the dataset with the index:
>
> PREFIX : <http://localhost/jena_example/#> PREFIX text: <
> http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
> SELECT * {?s text:query (mms:dataElementName 'AE')}
>
> I would expect to get the first dataElement: AERELNST. I am unsure as to
> whether my problem is in the format of my query or in the format of my
> assembler file. Any thoughts?
>
>
> On Sun, Sep 1, 2013 at 7:43 AM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 01/09/13 00:02, Brad Moran wrote:
>>
>>> sorry the file type should be saved as .owl
>>>
>>
>> I see no data.  If you had an attachment, then they don't get through to
>> the mailing list.
>>
>> Would it be possible to create a complete, minimal example of your setup?
>>   A small amount of data that shows the situation.
>> This description is quite long - is it all needed or can you see the same
>> issues in a smaller configuration?
>>
>>          Andy
>>
>>
>>>
>>> On Sat, Aug 31, 2013 at 7:00 PM, Brad Moran <bmoran@pinnacle21.net
>>> <ma...@pinnacle21.net>**> wrote:
>>>
>>>      Hi,
>>>      I am currently having a problem getting the exact results I want
>>>      from my text queries. I attached one example of my rdf that I begin
>>>      with. Then I run tdbloader and successfully create an index using
>>>      this assembler file with jena.textindexer:
>>>
>>>      @prefix :        <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>>> .
>>>      @prefix rdf:     <http://www.w3.org/1999/02/22-**rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>>
>>> .
>>>      @prefix rdfs:    <http://www.w3.org/2000/01/**rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>>
>>> .
>>>      @prefix tdb:     <http://jena.hpl.hp.com/2008/**tdb#<http://jena.hpl.hp.com/2008/tdb#>>
>>> .
>>>      @prefix ja:      <http://jena.hpl.hp.com/2005/**11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>>
>>> .
>>>      @prefix text:    <http://jena.apache.org/text#> .
>>>      @prefix mms:     <http://rdf.cdisc.org/mms#> .
>>>      @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-**2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>>
>>> .
>>>      @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-**3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>>
>>> .
>>>      @prefix sends: <http://rdf.cdisc.org/send/**schema#<http://rdf.cdisc.org/send/schema#>>
>>> .
>>>      @prefix sendigs: <http://rdf.cdisc.org/send-3.**0/schema#<http://rdf.cdisc.org/send-3.0/schema#>>
>>> .
>>>      @prefix cts: <http://rdf.cdisc.org/ct/**schema#<http://rdf.cdisc.org/ct/schema#>>
>>> .
>>>
>>>      ## Example of a TDB dataset and text index
>>>      ## Initialize TDB
>>>      [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>      tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>>      tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>>
>>>      ## Initialize text query
>>>      [] ja:loadClass       "org.apache.jena.query.text.**TextQuery" .
>>>      # A TextDataset is a regular dataset with a text index.
>>>      text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>      # Lucene index
>>>      text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>
>>>      ## ------------------------------**------------------------------**
>>> ---
>>>      ## This URI must be fixed - it's used to assemble the text dataset.
>>>
>>>      :text_dataset rdf:type     text:TextDataset ;
>>>           text:dataset   <#dataset> ;
>>>           text:index     <#indexLucene> ;
>>>           .
>>>
>>>      # A TDB dataset used for RDF storage
>>>      <#dataset> rdf:type      tdb:DatasetTDB ;
>>>           tdb:location "tdb" ;
>>>           # if from command line use: "NetBeansProjects/mdr-older/**
>>> trunk/tdb"
>>>           .
>>>
>>>      # Text index description
>>>      <#indexLucene> a text:TextIndexLucene ;
>>>           text:directory <file:luceneIndexes> ;
>>>           text:entityMap <#entMap> ;
>>>           .
>>>
>>>      # Mapping in the index
>>>      # URI stored in field "uri"
>>>      # rdfs:label is mapped to field "text"
>>>      <#entMap> a text:EntityMap ;
>>>           text:entityField      "uri" ;
>>>           text:defaultField     "text" ;
>>>           text:map (
>>>                [ text:field "text" ; text:predicate mms:dataElementName ]
>>>                [ text:field "text" ; text:predicate
>>>      mms:dataElementDescription ]
>>>        [ text:field "text" ; text:predicate mms:dataElementLabel ]
>>>        [ text:field "text" ; text:predicate mms:dataElementType ]
>>>        [ text:field "text" ; text:predicate mms:ordinal ]
>>>        [ text:field "text" ; text:predicate mms:broader ]
>>>                [ text:field "text" ; text:predicate mms:Dataset ]
>>>                [ text:field "text" ; text:predicate mms:contextName ]
>>>                [ text:field "text" ; text:predicate mms:contextLabel ]
>>>                [ text:field "text" ; text:predicate mms:contextDescription
>>> ]
>>>      [ text:field "text" ; text:predicate sdtms:dataElementType ]
>>>      [ text:field "text" ; text:predicate sdtms:dataElementRole ]
>>>      [ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
>>>                [ text:field "text" ; text:predicate
>>> sdtms:supportedBySDTMIG ]
>>>                [ text:field "text" ; text:predicate sdtms:supportedBySEND ]
>>>      [ text:field "text" ; text:predicate sdtmigs:references ]
>>>                [ text:field "text" ; text:predicate
>>> sdtmigs:domainStructure ]
>>>                [ text:field "text" ; text:predicate sdtmigs:domainCode ]
>>>                [ text:field "text" ; text:predicate
>>>      sdtmigs:**controlledTermsOrFormat ]
>>>                [ text:field "text" ; text:predicate
>>>      sends:dataElementCompliance ]
>>>                [ text:field "text" ; text:predicate sends:dataElementRole ]
>>>                [ text:field "text" ; text:predicate
>>> sendigs:domainStructure ]
>>>                [ text:field "text" ; text:predicate sendigs:domainCode ]
>>>                [ text:field "text" ; text:predicate
>>>      sendigs:**controlledTermsOrFormat ]
>>>                [ text:field "text" ; text:predicate cts:cdiscDefinition]
>>>                [ text:field "text" ; text:predicate cts:nciPreferredTerm]
>>>                [ text:field "text" ; text:predicate cts:nciCode]
>>>                [ text:field "text" ; text:predicate cts:cdiscSynonyms]
>>>                [ text:field "text" ; text:predicate
>>> cts:cdiscSubmissionValue]
>>>                [ text:field "text" ; text:predicate cts:codelistName]
>>>                [ text:field "text" ; text:predicate
>>> cts:isExtensibleCodelist]
>>>                ) .
>>>
>>>
>>>      I then try to run queries against this dataset, as an example say I
>>>      want to search "AE" then I would expect every dataElement within the
>>>      AE domain to be returned. However, I cannot get the desired result.
>>>      If I search:
>>>
>>>      PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>>> PREFIX text:
>>>      <http://jena.apache.org/text#> PREFIX mms:
>>>      <http://rdf.cdisc.org/mms#> SELECT * {?s text:query
>>>      (mms:dataElementName 'AE')}
>>>
>>>      I get:
>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.DOMAIN<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Table.AE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.FA.FACAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>
>>>>
>>>
>>>      when I would expect to get:
>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AERELNST<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEENDY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEMODIFY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AETOXGR<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEREFID<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESCAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESEQ<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
>>>>
>>>      <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESMIE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
>>>>
>>>      (And the rest of the .AE dataElements just listed a few here)
>>>
>>>      I also tried playing with this query a lot, but could not get the
>>>      desired result for example I tried the other form of query as well:
>>>
>>>      PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>>> PREFIX text:
>>>      <http://jena.apache.org/text#> PREFIX mms:
>>>      <http://rdf.cdisc.org/mms#> SELECT * {?subject mms:contextName ?o .
>>>      ?s text:query (mms:contextName 'SE')}
>>>
>>>
>>>      I am not sure whether the problem is a result of my query being
>>>      formed incorrectly, or whether the problem could be in my assembler
>>>      file that creates the index (is there a better/more complete way to
>>>      create an index for this rdf model?). Any suggestions would help,
>>>      like I mentioned in the beginning one of the rdf files from tdb is
>>>      attached. Thanks.
>>>
>>>
>>>
>>
>


Re: jena text query optimization

Posted by Brad Moran <bm...@pinnacle21.net>.
Ok, this is a sample of several large rdf files I am working with:

<?xml version="1.0"?>
<rdf:RDF
    xmlns:mms="http://rdf.cdisc.org/mms#"
    xmlns:sdtm="http://rdf.cdisc.org/sdtm-1-2/std#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:skos="http://www.w3.org/2004/02/skos/core#"
    xmlns:sdtmigs="http://rdf.cdisc.org/sdtmig-3-1-2/schema#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns="http://rdf.cdisc.org/sdtmig-3-1-2/std#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:sdtms="http://rdf.cdisc.org/sdtm-1-2/schema#"
    xmlns:sdtmct="http://rdf.cdisc.org/sdtm/ct#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xml:base="http://rdf.cdisc.org/sdtmig-3-1-2/std">
<mms:DataElement rdf:ID="Column.AE.AERELNST">
    <mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
    >xsd:string</mms:dataElementType>
    <mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >AERELNST</mms:dataElementName>
    <sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
    <mms:context>
      <mms:Dataset rdf:ID="Table.AE">
        <sdtmigs:domainStructure rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >One record per adverse event per subject</sdtmigs:domainStructure>
        <mms:contextName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >AE</mms:contextName>
        <mms:contextLabel rdf:parseType="Literal">Adverse
Events</mms:contextLabel>
        <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
        >8</mms:ordinal>
        <sdtmigs:domainCode rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >AE</sdtmigs:domainCode>
        <mms:context rdf:resource="#EventsObservationClass"/>
      </mms:Dataset>
    </mms:context>
    <sdtms:dataElementCompliance rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
    <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
    >21</mms:ordinal>
    <sdtmigs:references rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >SDTM 2.2.2</sdtmigs:references>
    <sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
    <mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Relationship to Non-Study Treatment</mms:dataElementLabel>
    <mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Records the investigator's opinion as to whether the event may have
been due to a treatment other than study drug. May be reported as free
text. Example: "MORE LIKELY RELATED TO ASPIRIN
USE.".</mms:dataElementDescription>
    <mms:broader rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/std#DE.Event.--RELNST"/>
  </mms:DataElement>
  <mms:DataElement rdf:ID="Column.SU.SUMODIFY">
    <mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Modified Substance Name</mms:dataElementLabel>
    <sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
    <mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
    >xsd:string</mms:dataElementType>
    <sdtmigs:references rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >SDTM 2.2.1, SDTMIG 4.1.3.6</sdtmigs:references>
    <mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >If SUTRT is modified, then the modified text is placed
here.</mms:dataElementDescription>
    <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
    >8</mms:ordinal>
    <sdtms:dataElementCompliance rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
    <mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >SUMODIFY</mms:dataElementName>
    <mms:context rdf:resource="#Table.SU"/>
    <mms:broader rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/std#DE.Intervention.--MODIFY"/>
    <sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.SynonymQualifier"/>
  </mms:DataElement>
  <mms:DataElement rdf:ID="Column.CO.IDVAR">
    <sdtms:dataElementRole rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier"/>
    <mms:dataElementName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >IDVAR</mms:dataElementName>
    <sdtms:dataElementCompliance rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.PermissibleVariable"/>
    <sdtmigs:controlledTermsOrFormat rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >*</sdtmigs:controlledTermsOrFormat>
    <mms:dataElementLabel rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Identifying Variable</mms:dataElementLabel>
    <mms:dataElementDescription rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
    >Identifying variable in the parent dataset that identifies the
record(s) to which the comment applies. Examples AESEQ or CMGRPID. Used
only when individual comments are related to domain records. Null for
comments collected on separate CRFs.</mms:dataElementDescription>
    <mms:context>
      <mms:Dataset rdf:ID="Table.CO">
        <sdtmigs:domainCode rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >CO</sdtmigs:domainCode>
        <mms:contextName rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >CO</mms:contextName>
        <sdtmigs:domainStructure rdf:datatype="
http://www.w3.org/2001/XMLSchema#string"
        >One record per comment per subject</sdtmigs:domainStructure>
        <mms:contextLabel
rdf:parseType="Literal">Comments</mms:contextLabel>
        <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
        >2</mms:ordinal>
        <mms:context rdf:resource="#SpecialPurposeDomain"/>
      </mms:Dataset>
    </mms:context>
    <mms:dataElementType rdf:datatype="
http://www.w3.org/2001/XMLSchema#QName"
    >xsd:string</mms:dataElementType>
    <mms:ordinal rdf:datatype="
http://www.w3.org/2001/XMLSchema#positiveInteger"
    >6</mms:ordinal>
    <sdtms:dataElementType rdf:resource="
http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.Character"/>
  </mms:DataElement>


I then index this data using this assembler file using jena.textindexer:

@prefix :        <http://localhost/jena_example/#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix mms:     <http://rdf.cdisc.org/mms#> .
@prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
@prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .

         ) .

Finally I try to run a query on the dataset with the index:

PREFIX : <http://localhost/jena_example/#> PREFIX text: <
http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
SELECT * {?s text:query (mms:dataElementName 'AE')}

I would expect to get the first dataElement: AERELNST. I am unsure as to
whether my problem is in the format of my query or in the format of my
assembler file. Any thoughts?


On Sun, Sep 1, 2013 at 7:43 AM, Andy Seaborne <an...@apache.org> wrote:

> On 01/09/13 00:02, Brad Moran wrote:
>
>> sorry the file type should be saved as .owl
>>
>
> I see no data.  If you had an attachment, then they don't get through to
> the mailing list.
>
> Would it be possible to create a complete, minimal example of your setup?
>  A small amount of data that shows the situation.
> This description is quite long - is it all needed or can you see the same
> issues in a smaller configuration?
>
>         Andy
>
>
>>
>> On Sat, Aug 31, 2013 at 7:00 PM, Brad Moran <bmoran@pinnacle21.net
>> <ma...@pinnacle21.net>**> wrote:
>>
>>     Hi,
>>     I am currently having a problem getting the exact results I want
>>     from my text queries. I attached one example of my rdf that I begin
>>     with. Then I run tdbloader and successfully create an index using
>>     this assembler file with jena.textindexer:
>>
>>     @prefix :        <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> .
>>     @prefix rdf:     <http://www.w3.org/1999/02/22-**rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>>
>> .
>>     @prefix rdfs:    <http://www.w3.org/2000/01/**rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>>
>> .
>>     @prefix tdb:     <http://jena.hpl.hp.com/2008/**tdb#<http://jena.hpl.hp.com/2008/tdb#>>
>> .
>>     @prefix ja:      <http://jena.hpl.hp.com/2005/**11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>>
>> .
>>     @prefix text:    <http://jena.apache.org/text#> .
>>     @prefix mms:     <http://rdf.cdisc.org/mms#> .
>>     @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-**2/schema#<http://rdf.cdisc.org/sdtm-1-2/schema#>>
>> .
>>     @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-**3-1-2/schema#<http://rdf.cdisc.org/sdtmig-3-1-2/schema#>>
>> .
>>     @prefix sends: <http://rdf.cdisc.org/send/**schema#<http://rdf.cdisc.org/send/schema#>>
>> .
>>     @prefix sendigs: <http://rdf.cdisc.org/send-3.**0/schema#<http://rdf.cdisc.org/send-3.0/schema#>>
>> .
>>     @prefix cts: <http://rdf.cdisc.org/ct/**schema#<http://rdf.cdisc.org/ct/schema#>>
>> .
>>
>>     ## Example of a TDB dataset and text index
>>     ## Initialize TDB
>>     [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>     tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>     tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>
>>     ## Initialize text query
>>     [] ja:loadClass       "org.apache.jena.query.text.**TextQuery" .
>>     # A TextDataset is a regular dataset with a text index.
>>     text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>     # Lucene index
>>     text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>
>>     ## ------------------------------**------------------------------**
>> ---
>>     ## This URI must be fixed - it's used to assemble the text dataset.
>>
>>     :text_dataset rdf:type     text:TextDataset ;
>>          text:dataset   <#dataset> ;
>>          text:index     <#indexLucene> ;
>>          .
>>
>>     # A TDB dataset used for RDF storage
>>     <#dataset> rdf:type      tdb:DatasetTDB ;
>>          tdb:location "tdb" ;
>>          # if from command line use: "NetBeansProjects/mdr-older/**
>> trunk/tdb"
>>          .
>>
>>     # Text index description
>>     <#indexLucene> a text:TextIndexLucene ;
>>          text:directory <file:luceneIndexes> ;
>>          text:entityMap <#entMap> ;
>>          .
>>
>>     # Mapping in the index
>>     # URI stored in field "uri"
>>     # rdfs:label is mapped to field "text"
>>     <#entMap> a text:EntityMap ;
>>          text:entityField      "uri" ;
>>          text:defaultField     "text" ;
>>          text:map (
>>               [ text:field "text" ; text:predicate mms:dataElementName ]
>>               [ text:field "text" ; text:predicate
>>     mms:dataElementDescription ]
>>       [ text:field "text" ; text:predicate mms:dataElementLabel ]
>>       [ text:field "text" ; text:predicate mms:dataElementType ]
>>       [ text:field "text" ; text:predicate mms:ordinal ]
>>       [ text:field "text" ; text:predicate mms:broader ]
>>               [ text:field "text" ; text:predicate mms:Dataset ]
>>               [ text:field "text" ; text:predicate mms:contextName ]
>>               [ text:field "text" ; text:predicate mms:contextLabel ]
>>               [ text:field "text" ; text:predicate mms:contextDescription
>> ]
>>     [ text:field "text" ; text:predicate sdtms:dataElementType ]
>>     [ text:field "text" ; text:predicate sdtms:dataElementRole ]
>>     [ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
>>               [ text:field "text" ; text:predicate
>> sdtms:supportedBySDTMIG ]
>>               [ text:field "text" ; text:predicate sdtms:supportedBySEND ]
>>     [ text:field "text" ; text:predicate sdtmigs:references ]
>>               [ text:field "text" ; text:predicate
>> sdtmigs:domainStructure ]
>>               [ text:field "text" ; text:predicate sdtmigs:domainCode ]
>>               [ text:field "text" ; text:predicate
>>     sdtmigs:**controlledTermsOrFormat ]
>>               [ text:field "text" ; text:predicate
>>     sends:dataElementCompliance ]
>>               [ text:field "text" ; text:predicate sends:dataElementRole ]
>>               [ text:field "text" ; text:predicate
>> sendigs:domainStructure ]
>>               [ text:field "text" ; text:predicate sendigs:domainCode ]
>>               [ text:field "text" ; text:predicate
>>     sendigs:**controlledTermsOrFormat ]
>>               [ text:field "text" ; text:predicate cts:cdiscDefinition]
>>               [ text:field "text" ; text:predicate cts:nciPreferredTerm]
>>               [ text:field "text" ; text:predicate cts:nciCode]
>>               [ text:field "text" ; text:predicate cts:cdiscSynonyms]
>>               [ text:field "text" ; text:predicate
>> cts:cdiscSubmissionValue]
>>               [ text:field "text" ; text:predicate cts:codelistName]
>>               [ text:field "text" ; text:predicate
>> cts:isExtensibleCodelist]
>>               ) .
>>
>>
>>     I then try to run queries against this dataset, as an example say I
>>     want to search "AE" then I would expect every dataElement within the
>>     AE domain to be returned. However, I cannot get the desired result.
>>     If I search:
>>
>>     PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> PREFIX text:
>>     <http://jena.apache.org/text#> PREFIX mms:
>>     <http://rdf.cdisc.org/mms#> SELECT * {?s text:query
>>     (mms:dataElementName 'AE')}
>>
>>     I get:
>>
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.DOMAIN<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Table.AE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.FA.FACAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>
>> >
>>
>>     when I would expect to get:
>>
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AERELNST<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEENDY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEMODIFY<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AETOXGR<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AEREFID<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESCAT<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESEQ<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
>> >
>>     <http://rdf.cdisc.org/sdtmig-**3-1-2/std#Column.AE.AESMIE<http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
>> >
>>     (And the rest of the .AE dataElements just listed a few here)
>>
>>     I also tried playing with this query a lot, but could not get the
>>     desired result for example I tried the other form of query as well:
>>
>>     PREFIX : <http://localhost/jena_**example/#<http://localhost/jena_example/#>>
>> PREFIX text:
>>     <http://jena.apache.org/text#> PREFIX mms:
>>     <http://rdf.cdisc.org/mms#> SELECT * {?subject mms:contextName ?o .
>>     ?s text:query (mms:contextName 'SE')}
>>
>>
>>     I am not sure whether the problem is a result of my query being
>>     formed incorrectly, or whether the problem could be in my assembler
>>     file that creates the index (is there a better/more complete way to
>>     create an index for this rdf model?). Any suggestions would help,
>>     like I mentioned in the beginning one of the rdf files from tdb is
>>     attached. Thanks.
>>
>>
>>
>

Re: jena text query optimization

Posted by Andy Seaborne <an...@apache.org>.
On 01/09/13 00:02, Brad Moran wrote:
> sorry the file type should be saved as .owl

I see no data.  If you had an attachment, then they don't get through to 
the mailing list.

Would it be possible to create a complete, minimal example of your 
setup?  A small amount of data that shows the situation.
This description is quite long - is it all needed or can you see the 
same issues in a smaller configuration?

	Andy

>
>
> On Sat, Aug 31, 2013 at 7:00 PM, Brad Moran <bmoran@pinnacle21.net
> <ma...@pinnacle21.net>> wrote:
>
>     Hi,
>     I am currently having a problem getting the exact results I want
>     from my text queries. I attached one example of my rdf that I begin
>     with. Then I run tdbloader and successfully create an index using
>     this assembler file with jena.textindexer:
>
>     @prefix :        <http://localhost/jena_example/#> .
>     @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>     @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
>     @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>     @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
>     @prefix text:    <http://jena.apache.org/text#> .
>     @prefix mms:     <http://rdf.cdisc.org/mms#> .
>     @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
>     @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .
>     @prefix sends: <http://rdf.cdisc.org/send/schema#> .
>     @prefix sendigs: <http://rdf.cdisc.org/send-3.0/schema#> .
>     @prefix cts: <http://rdf.cdisc.org/ct/schema#> .
>
>     ## Example of a TDB dataset and text index
>     ## Initialize TDB
>     [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>     tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>     tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>
>     ## Initialize text query
>     [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
>     # A TextDataset is a regular dataset with a text index.
>     text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>     # Lucene index
>     text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>
>     ## ---------------------------------------------------------------
>     ## This URI must be fixed - it's used to assemble the text dataset.
>
>     :text_dataset rdf:type     text:TextDataset ;
>          text:dataset   <#dataset> ;
>          text:index     <#indexLucene> ;
>          .
>
>     # A TDB dataset used for RDF storage
>     <#dataset> rdf:type      tdb:DatasetTDB ;
>          tdb:location "tdb" ;
>          # if from command line use: "NetBeansProjects/mdr-older/trunk/tdb"
>          .
>
>     # Text index description
>     <#indexLucene> a text:TextIndexLucene ;
>          text:directory <file:luceneIndexes> ;
>          text:entityMap <#entMap> ;
>          .
>
>     # Mapping in the index
>     # URI stored in field "uri"
>     # rdfs:label is mapped to field "text"
>     <#entMap> a text:EntityMap ;
>          text:entityField      "uri" ;
>          text:defaultField     "text" ;
>          text:map (
>               [ text:field "text" ; text:predicate mms:dataElementName ]
>               [ text:field "text" ; text:predicate
>     mms:dataElementDescription ]
>       [ text:field "text" ; text:predicate mms:dataElementLabel ]
>       [ text:field "text" ; text:predicate mms:dataElementType ]
>       [ text:field "text" ; text:predicate mms:ordinal ]
>       [ text:field "text" ; text:predicate mms:broader ]
>               [ text:field "text" ; text:predicate mms:Dataset ]
>               [ text:field "text" ; text:predicate mms:contextName ]
>               [ text:field "text" ; text:predicate mms:contextLabel ]
>               [ text:field "text" ; text:predicate mms:contextDescription ]
>     [ text:field "text" ; text:predicate sdtms:dataElementType ]
>     [ text:field "text" ; text:predicate sdtms:dataElementRole ]
>     [ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
>               [ text:field "text" ; text:predicate sdtms:supportedBySDTMIG ]
>               [ text:field "text" ; text:predicate sdtms:supportedBySEND ]
>     [ text:field "text" ; text:predicate sdtmigs:references ]
>               [ text:field "text" ; text:predicate sdtmigs:domainStructure ]
>               [ text:field "text" ; text:predicate sdtmigs:domainCode ]
>               [ text:field "text" ; text:predicate
>     sdtmigs:controlledTermsOrFormat ]
>               [ text:field "text" ; text:predicate
>     sends:dataElementCompliance ]
>               [ text:field "text" ; text:predicate sends:dataElementRole ]
>               [ text:field "text" ; text:predicate sendigs:domainStructure ]
>               [ text:field "text" ; text:predicate sendigs:domainCode ]
>               [ text:field "text" ; text:predicate
>     sendigs:controlledTermsOrFormat ]
>               [ text:field "text" ; text:predicate cts:cdiscDefinition]
>               [ text:field "text" ; text:predicate cts:nciPreferredTerm]
>               [ text:field "text" ; text:predicate cts:nciCode]
>               [ text:field "text" ; text:predicate cts:cdiscSynonyms]
>               [ text:field "text" ; text:predicate cts:cdiscSubmissionValue]
>               [ text:field "text" ; text:predicate cts:codelistName]
>               [ text:field "text" ; text:predicate cts:isExtensibleCodelist]
>               ) .
>
>
>     I then try to run queries against this dataset, as an example say I
>     want to search "AE" then I would expect every dataElement within the
>     AE domain to be returned. However, I cannot get the desired result.
>     If I search:
>
>     PREFIX : <http://localhost/jena_example/#> PREFIX text:
>     <http://jena.apache.org/text#> PREFIX mms:
>     <http://rdf.cdisc.org/mms#> SELECT * {?s text:query
>     (mms:dataElementName 'AE')}
>
>     I get:
>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>
>
>     when I would expect to get:
>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
>     <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
>     (And the rest of the .AE dataElements just listed a few here)
>
>     I also tried playing with this query a lot, but could not get the
>     desired result for example I tried the other form of query as well:
>
>     PREFIX : <http://localhost/jena_example/#> PREFIX text:
>     <http://jena.apache.org/text#> PREFIX mms:
>     <http://rdf.cdisc.org/mms#> SELECT * {?subject mms:contextName ?o .
>     ?s text:query (mms:contextName 'SE')}
>
>
>     I am not sure whether the problem is a result of my query being
>     formed incorrectly, or whether the problem could be in my assembler
>     file that creates the index (is there a better/more complete way to
>     create an index for this rdf model?). Any suggestions would help,
>     like I mentioned in the beginning one of the rdf files from tdb is
>     attached. Thanks.
>
>


Re: jena text query optimization

Posted by Brad Moran <bm...@pinnacle21.net>.
sorry the file type should be saved as .owl


On Sat, Aug 31, 2013 at 7:00 PM, Brad Moran <bm...@pinnacle21.net> wrote:

> Hi,
> I am currently having a problem getting the exact results I want from my
> text queries. I attached one example of my rdf that I begin with. Then I
> run tdbloader and successfully create an index using this assembler file
> with jena.textindexer:
>
> @prefix :        <http://localhost/jena_example/#> .
> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text:    <http://jena.apache.org/text#> .
> @prefix mms:     <http://rdf.cdisc.org/mms#> .
> @prefix sdtms:   <http://rdf.cdisc.org/sdtm-1-2/schema#> .
> @prefix sdtmigs: <http://rdf.cdisc.org/sdtmig-3-1-2/schema#> .
> @prefix sends: <http://rdf.cdisc.org/send/schema#> .
> @prefix sendigs: <http://rdf.cdisc.org/send-3.0/schema#> .
> @prefix cts: <http://rdf.cdisc.org/ct/schema#> .
>
> ## Example of a TDB dataset and text index
> ## Initialize TDB
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>
> ## Initialize text query
> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
> # A TextDataset is a regular dataset with a text index.
> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
> # Lucene index
> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>
> ## ---------------------------------------------------------------
> ## This URI must be fixed - it's used to assemble the text dataset.
>
> :text_dataset rdf:type     text:TextDataset ;
>     text:dataset   <#dataset> ;
>     text:index     <#indexLucene> ;
>     .
>
> # A TDB dataset used for RDF storage
> <#dataset> rdf:type      tdb:DatasetTDB ;
>     tdb:location "tdb" ;
>     # if from command line use: "NetBeansProjects/mdr-older/trunk/tdb"
>     .
>
> # Text index description
> <#indexLucene> a text:TextIndexLucene ;
>     text:directory <file:luceneIndexes> ;
>     text:entityMap <#entMap> ;
>     .
>
> # Mapping in the index
> # URI stored in field "uri"
> # rdfs:label is mapped to field "text"
> <#entMap> a text:EntityMap ;
>     text:entityField      "uri" ;
>     text:defaultField     "text" ;
>     text:map (
>          [ text:field "text" ; text:predicate mms:dataElementName ]
>          [ text:field "text" ; text:predicate mms:dataElementDescription ]
>   [ text:field "text" ; text:predicate mms:dataElementLabel ]
>  [ text:field "text" ; text:predicate mms:dataElementType ]
>   [ text:field "text" ; text:predicate mms:ordinal ]
>  [ text:field "text" ; text:predicate mms:broader ]
>          [ text:field "text" ; text:predicate mms:Dataset ]
>          [ text:field "text" ; text:predicate mms:contextName ]
>          [ text:field "text" ; text:predicate mms:contextLabel ]
>          [ text:field "text" ; text:predicate mms:contextDescription ]
> [ text:field "text" ; text:predicate sdtms:dataElementType ]
>   [ text:field "text" ; text:predicate sdtms:dataElementRole ]
> [ text:field "text" ; text:predicate sdtms:dataElementCompliance ]
>          [ text:field "text" ; text:predicate sdtms:supportedBySDTMIG ]
>          [ text:field "text" ; text:predicate sdtms:supportedBySEND ]
> [ text:field "text" ; text:predicate sdtmigs:references ]
>          [ text:field "text" ; text:predicate sdtmigs:domainStructure ]
>          [ text:field "text" ; text:predicate sdtmigs:domainCode ]
>          [ text:field "text" ; text:predicate
> sdtmigs:controlledTermsOrFormat ]
>          [ text:field "text" ; text:predicate sends:dataElementCompliance ]
>          [ text:field "text" ; text:predicate sends:dataElementRole ]
>          [ text:field "text" ; text:predicate sendigs:domainStructure ]
>          [ text:field "text" ; text:predicate sendigs:domainCode ]
>          [ text:field "text" ; text:predicate
> sendigs:controlledTermsOrFormat ]
>          [ text:field "text" ; text:predicate cts:cdiscDefinition]
>          [ text:field "text" ; text:predicate cts:nciPreferredTerm]
>          [ text:field "text" ; text:predicate cts:nciCode]
>          [ text:field "text" ; text:predicate cts:cdiscSynonyms]
>          [ text:field "text" ; text:predicate cts:cdiscSubmissionValue]
>          [ text:field "text" ; text:predicate cts:codelistName]
>          [ text:field "text" ; text:predicate cts:isExtensibleCodelist]
>          ) .
>
>
> I then try to run queries against this dataset, as an example say I want
> to search "AE" then I would expect every dataElement within the AE domain
> to be returned. However, I cannot get the desired result. If I search:
>
> PREFIX : <http://localhost/jena_example/#> PREFIX text: <
> http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
> SELECT * {?s text:query (mms:dataElementName 'AE')}
>
> I get:
>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.DOMAIN>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.FA.FACAT>
>
> when I would expect to get:
>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AERELNST>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEENDY>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEMODIFY>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AETOXGR>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEREFID>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESCAT>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESEQ>
> <http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AESMIE>
> (And the rest of the .AE dataElements just listed a few here)
>
> I also tried playing with this query a lot, but could not get the desired
> result for example I tried the other form of query as well:
>
> PREFIX : <http://localhost/jena_example/#> PREFIX text: <
> http://jena.apache.org/text#> PREFIX mms: <http://rdf.cdisc.org/mms#>
> SELECT * {?subject mms:contextName ?o . ?s text:query (mms:contextName
> 'SE')}
>
>
> I am not sure whether the problem is a result of my query being formed
> incorrectly, or whether the problem could be in my assembler file that
> creates the index (is there a better/more complete way to create an index
> for this rdf model?). Any suggestions would help, like I mentioned in the
> beginning one of the rdf files from tdb is attached. Thanks.
>