You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@epimorphics.com> on 2011/03/03 12:33:52 UTC

LARQ indexes

Pondering on what the LARQ assembler might look like ...

1/ Is the index associated with a graph or a dataset? or are both 
possible? (I think my pref is dataset)

2/ Here's a assembler for a dataset of a default graph and named graph.

<#ds1>   rdf:type ja:RDFDataset ;
     ja:defaultGraph    <#model1> ;
     rdfs:label "Dataset 1" ;
     ja:namedGraph
         [ ja:graphName      <http://example.org/name1> ;
           ja:graph          <#model1> ] ;
     ja:namedGraph
         [ ja:graphName      <http://example.org/name2> ;
           ja:graph          <#model2>
         ] ;
     .

<#model1>  rdf:type ja:MemoryModel ;
     rdfs:label "Model(plain)" ;
     ja:content [ ja:externalContent <file:FILE-1.ttl> ] ;
     ja:content [ ja:externalContent <file:FILE-2.ttl> ] ;
     .


LARQ adds:

<#ds1>   rdf:type ja:RDFDataset ;
     larq:textIndex "Location of Lucene data" ;
...


to add it to a dataset.

This *attaches* it, it does not *build* it.

	Andy


Re: LARQ indexes and assemblers

Posted by Paolo Castagna <ca...@googlemail.com>.
Chris Dollin wrote:
> On Tuesday, March 29, 2011 07:29:22 pm Andy Seaborne wrote:
> 
>> Chris - if a call to assemble the same resource is made does the 
>> assembler code return exactly the same (==) object?
> 
> Short answer: no.
> 
> Longer answer: It depends.
> 
> The code that actually does the assembling could cache if it wanted
> to. In general it probably shouldn't. And it should be straightforward
> to write an assembler wrapper which wrapped a (general) assembler and
> cached some (not necessarily all) of the assemblies done.
> 
> Chris 

Hi Chris,
thanks for your answer.

This means that LARQ assembler would need to be added to the current RDF Dataset
assemblers. For example, for TDB, see DatasetAssemblerTDB:
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/assembler/DatasetAssemblerTDB.java

We should check if the root resource has a larq:textIndex property and if LARQ
jar is in the classpath, if so: create the necessary LARQ specific objects and
attach them to the DatasetGraphTDB.

I'll try that.

Thanks,
Paolo




Re: LARQ indexes and assemblers

Posted by Chris Dollin <ch...@epimorphics.com>.
On Tuesday, March 29, 2011 07:29:22 pm Andy Seaborne wrote:

> Chris - if a call to assemble the same resource is made does the 
> assembler code return exactly the same (==) object?

Short answer: no.

Longer answer: It depends.

The code that actually does the assembling could cache if it wanted
to. In general it probably shouldn't. And it should be straightforward
to write an assembler wrapper which wrapped a (general) assembler and
cached some (not necessarily all) of the assemblies done.

Chris 

-- 
"I know it was late, but Mountjoy never bothers,                /Archer's Goon/
 so long as it's the full two thousand words."

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)

Re: LARQ indexes and assemblers

Posted by Andy Seaborne <an...@epimorphics.com>.

On 28/03/11 09:03, Paolo Castagna wrote:
> Andy Seaborne wrote:
>> Pondering on what the LARQ assembler might look like ...
>>
>> 1/ Is the index associated with a graph or a dataset? or are both
>> possible? (I think my pref is dataset)
>>
>> 2/ Here's a assembler for a dataset of a default graph and named graph.
>>
>> <#ds1> rdf:type ja:RDFDataset ;
>> ja:defaultGraph <#model1> ;
>> rdfs:label "Dataset 1" ;
>> ja:namedGraph
>> [ ja:graphName <http://example.org/name1> ;
>> ja:graph <#model1> ] ;
>> ja:namedGraph
>> [ ja:graphName <http://example.org/name2> ;
>> ja:graph <#model2>
>> ] ;
>> .
>>
>> <#model1> rdf:type ja:MemoryModel ;
>> rdfs:label "Model(plain)" ;
>> ja:content [ ja:externalContent <file:FILE-1.ttl> ] ;
>> ja:content [ ja:externalContent <file:FILE-2.ttl> ] ;
>> .
>>
>>
>> LARQ adds:
>>
>> <#ds1> rdf:type ja:RDFDataset ;
>> larq:textIndex "Location of Lucene data" ;
>> ...
>>
>>
>> to add it to a dataset.
>
> Hi,
> I've never extended the configuration using the Jena assembler mechanism.
>
> So, I had a look at it, however I am not sure I understand how get a
> reference
> to the RDFDataset the larq:textIndex has been used with.
>
> LARQ already had an AssemblerLARQ class, which I diligently stolen:
> https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/src/main/java/org/apache/jena/larq/assembler/AssemblerLARQ.java
>
> AssemblerLARQ builds a IndexLARQ, however I would like to *attach* it to
> the
> RDF Dataset so that it listens for statements added or removed and it keeps
> the Lucene index in sync.
>
> How do I get a reference to the RDF Dataset?

I think the dataset assembler code has to help.  i.e. the dataset says:

<#ds1> rdf:type ja:RDFDataset ;
    larq:index <#larq1> ;
    .

<#larq1> rdf:type :LARQindex ;
    ...
    .

Unless --

Chris - if a call to assemble the same resource is made does the 
assembler code return exactly the same (==) object?

	Andy

Re: LARQ indexes

Posted by Paolo Castagna <ca...@googlemail.com>.
Andy Seaborne wrote:
> Pondering on what the LARQ assembler might look like ...
> 
> 1/ Is the index associated with a graph or a dataset? or are both 
> possible? (I think my pref is dataset)
> 
> 2/ Here's a assembler for a dataset of a default graph and named graph.
> 
> <#ds1>   rdf:type ja:RDFDataset ;
>     ja:defaultGraph    <#model1> ;
>     rdfs:label "Dataset 1" ;
>     ja:namedGraph
>         [ ja:graphName      <http://example.org/name1> ;
>           ja:graph          <#model1> ] ;
>     ja:namedGraph
>         [ ja:graphName      <http://example.org/name2> ;
>           ja:graph          <#model2>
>         ] ;
>     .
> 
> <#model1>  rdf:type ja:MemoryModel ;
>     rdfs:label "Model(plain)" ;
>     ja:content [ ja:externalContent <file:FILE-1.ttl> ] ;
>     ja:content [ ja:externalContent <file:FILE-2.ttl> ] ;
>     .
> 
> 
> LARQ adds:
> 
> <#ds1>   rdf:type ja:RDFDataset ;
>     larq:textIndex "Location of Lucene data" ;
> ...
> 
> 
> to add it to a dataset.

Hi,
I've never extended the configuration using the Jena assembler mechanism.

So, I had a look at it, however I am not sure I understand how get a reference
to the RDFDataset the larq:textIndex has been used with.

LARQ already had an AssemblerLARQ class, which I diligently stolen:
https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/src/main/java/org/apache/jena/larq/assembler/AssemblerLARQ.java
AssemblerLARQ builds a IndexLARQ, however I would like to *attach* it to the
RDF Dataset so that it listens for statements added or removed and it keeps
the Lucene index in sync.

How do I get a reference to the RDF Dataset?

There is clearly something obvious I am missing here, but I am stuck.

If there is an example which does something similar, please, point me at it.

Thank you,
Paolo

> This *attaches* it, it does not *build* it.
> 
>     Andy
> 


Re: LARQ indexes

Posted by Paolo Castagna <ca...@googlemail.com>.
Andy Seaborne wrote:
> Pondering on what the LARQ assembler might look like ...
> 
> 1/ Is the index associated with a graph or a dataset? or are both 
> possible? (I think my pref is dataset)

I like the idea of having LARQ supporting RDF datasets rather than
being limited to a single Jena Model or RDF graph.

However, IndexBuilderModel [1] extends StatementListener to update a
Lucene index. Does a RDF dataset has a listener or we would need to
add listeners to all the graphs/models in an RDF dataset?

 From a point of view of search, would the search work only within a
specific graph or across all the graphs in an RDF dataset? This is
not completely clear to me.

Do we need to add an additional graph URI to each Lucene document in
the index?

  [1] 
https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/src/main/java/org/apache/jena/larq/IndexBuilderModel.java

> 
> 2/ Here's a assembler for a dataset of a default graph and named graph.
> 
> <#ds1>   rdf:type ja:RDFDataset ;
>     ja:defaultGraph    <#model1> ;
>     rdfs:label "Dataset 1" ;
>     ja:namedGraph
>         [ ja:graphName      <http://example.org/name1> ;
>           ja:graph          <#model1> ] ;
>     ja:namedGraph
>         [ ja:graphName      <http://example.org/name2> ;
>           ja:graph          <#model2>
>         ] ;
>     .
> 
> <#model1>  rdf:type ja:MemoryModel ;
>     rdfs:label "Model(plain)" ;
>     ja:content [ ja:externalContent <file:FILE-1.ttl> ] ;
>     ja:content [ ja:externalContent <file:FILE-2.ttl> ] ;
>     .
> 
> 
> LARQ adds:
> 
> <#ds1>   rdf:type ja:RDFDataset ;
>     larq:textIndex "Location of Lucene data" ;
> ...
> 
> 
> to add it to a dataset.

+1

Is it possible, from an Assembler point of view, to attach an
ja:RDFDataset to a Model and to an RDF Dataset?
I expect the answer to me yes (but more work)... however, I
want to confirm that.

We could support both approaches (old and new one).

> This *attaches* it, it does not *build* it.

Yes.

I am interested in providing an easy way to solve the *build*
problem as well and/or allow users to rebuild an index pointing
to an existing TDB or SDB store. However, this is a separate
problem and it could be done with a command tool.

Let's do *attach* first and discuss *build* after.

Paolo

> 
>     Andy
>