You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Carlos S. Zamudio" <cs...@yahoo.com> on 2013/12/07 16:34:50 UTC

Re: Specifying An EntityDefinition when Building a Jena TDB index

In retrospect, I can see why this was probably obvious.

In case anyone stumbles onto this tread, the trick to indexing an 
existing TDB-backed model is to build the index using the TextIndex 
methods and iterate over the model. If you look at the textindexer.java 
source referenced in the Indexing documentation, you'll see how to do this.

On 11/27/2013 6:22 AM, Carlos S. Zamudio wrote:
> Yes thank you. This makes sense when building an index on a model from 
> a source RDF/TTL file.  But can you index an existing TDB-backed 
> model? None of the examples as best I can determine do this in two 
> steps. Add a model to TDB, and then index. I can see why this might 
> not be the appropriate workflow, but is it possible? Thanks.
>
> On 11/27/2013 4:20 AM, Andy Seaborne wrote:
>> Carlos,
>>
>> I'm having trouble reading your code - could you put in plain text 
>> next time please?
>>
>> Your seem to be loading the RDF directly into the base TDB storage, 
>> not via the text index dataset that adds the functionality to the TDB 
>> dataset.
>>
>> You could use
>>
>> RDFDataMgr.read(indexedDataset, modelUrl)
>>
>> to send it via the text dataset.  Syncing this afterward is good style.
>>
>> You can add additional fields with EntityDefinition.set
>>
>>
>> Also, take a look at the code for arq.textindexer
>>
>> https://svn.apache.org/repos/asf/jena/trunk/jena-text/src/main/java/jena/textindexer.java 
>>
>>
>>
>> which uses
>>
>> textIndex.startIndexing() ;
>> ...
>> textIndex.addEntity(entity) ;
>> ...
>> textIndex.finishIndexing() ;
>>
>> if you want a lot of control.
>>
>>     Andy
>>
>> On 26/11/13 20:25, Carlos S. Zamudio wrote:
>>> Below is a bit more complete example of my problem.  What happens 
>>> when I
>>> run this is that the TDB model is created in the specified directory 
>>> but
>>> index directory contains only a couple of segements files, but not a
>>> complete index.   (If I run the example supplied with the jena-text
>>> release I can get an index created). So I'm guessing that the way I am
>>> creating an index from an existing TDB source is not persisting the
>>> index for some reason.
>>>
>>>     String modelUrl ="file:///E:/skos/AAA.xml";
>>>     String modelDirectory ="E:/tdb/AAA";
>>>     File indexPath =*new*File("E:/tdbindex/AAA");
>>>     Directory directory = FSDirectory./open/(indexPath);
>>>     //
>>>     Dataset modelDataset =*null*;
>>>     Dataset indexedDataset =*null*;
>>>     *try*{
>>>     modelDataset = TDBFactory./createDataset/(modelDirectory);
>>>     //
>>>     Model modelBase = modelDataset.getDefaultModel();
>>>     modelBase.read(modelUrl);
>>>     //
>>>     Model defaultModel = modelDataset.getDefaultModel();
>>>     StmtIterator si = defaultModel.listStatements();
>>>     System./out/.println("Number of model statements:
>>> "+si.toList().size());
>>>     //
>>>     EntityDefinition entDef
>>>     =*new*EntityDefinition(/PREF_LABEL_PROPERTY/,"prefLabel",
>>>     RDFS./label/.asNode()) ;
>>>     //
>>>     indexedDataset = TextDatasetFactory./createLucene/(modelDataset,
>>>     directory, entDef);
>>>
>>>     defaultModel = indexedDataset.getDefaultModel();
>>>            si = defaultModel.listStatements();
>>>     System./out/.println("Number of model statements:
>>> "+si.toList().size());
>>>       }
>>>     *catch*(Exception e) {
>>>     *throw*e;
>>>       }
>>>     *finally*{
>>>     *if*(modelDataset !=*null*)  { modelDataset.close(); }
>>>     *try*{*if*(indexedDataset !=*null*) { indexedDataset.close(); }
>>>     }*catch*(Exception e) {}
>>>       }
>>>
>>> Thanks for any suggestions.
>>>
>>>
>>>
>>> On 11/26/2013 3:23 AM, Andy Seaborne wrote:
>>>> Carlos,
>>>>
>>>> Do you have a complete, minimal example?  Your description looks OK
>>>> but the details matter.  What is the code to setup the index?
>>>>
>>>>     Andy
>>>>
>>>>
>>>> On 26/11/13 00:47, Carlos S. Zamudio wrote:
>>>>> Hi,
>>>>>
>>>>> I'm having a bit of trouble deciphering the specification of the
>>>>> EntityDefinition when constructing a Jena TDB index using the 
>>>>> jena-text
>>>>> module in 2.11.0. (I've been successfully using the previous LARQ 
>>>>> module
>>>>> for indexing RDF data sets).
>>>>>
>>>>> I am attempting to index a data set that represents a SKOS 
>>>>> vocabulary.
>>>>> Below is an example entry in the model:
>>>>>
>>>>> |<http://purl.obolibrary.org/obo/ID_62354>|||
>>>>>
>>>>> |         skos:broader <http://purl.obolibrary.org/id/ID_35317> ;|
>>>>>
>>>>> |         skos:prefLabel    "The preferred label for the entity" ;|
>>>>>
>>>>> |         skos:hiddenLabel  "The hidden label for the entity" ;|
>>>>>
>>>>> |         skos:altLabel     "An alternative label for the entity" ;|
>>>>>
>>>>> |         rdf:type          skos:Concept|
>>>>>
>>>>> The skos:prefLabel, skos:hiddenLabel and skos:altLabel are 
>>>>> subclasses of
>>>>> rdfs:label.
>>>>>
>>>>> I would like to index the prefLabel, hiddenLabel and altLabels for 
>>>>> all
>>>>> of the entries.
>>>>>
>>>>> The EntityDefintion is defined in the documentation as follows:
>>>>>
>>>>> |public EntityDefinition(String entityField,|||
>>>>>
>>>>> |                 String primaryField,|
>>>>>
>>>>> |                 com.hp.hpl.jena.rdf.model.Resource 
>>>>> primaryPredicate)|
>>>>>
>>>>>> From what I can gather the entityField is the field name in the 
>>>>>> index.
>>>>> The primary field should be the skos:prefLabel property for 
>>>>> example. And
>>>>> the primaryPredicate should be specified as the RDFS.label.asNode()
>>>>> resource.
>>>>>
>>>>> It seems I can also add additional fields by calling the .set() 
>>>>> method.
>>>>>
>>>>> I can't seem to generate an index file when I use:
>>>>>
>>>>> TextDatasetFactory.createLucene(dataset, directory, entityDefition)
>>>>>
>>>>> I've verified that my dataset is valid, and that the directory is 
>>>>> also
>>>>> valid.
>>>>>
>>>>> Do I have the right idea for specifying an EntityDefinition?
>>>>>
>>>>> Any hints would be appreciated.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>