You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Karen Menz <do...@yahoo.com> on 2014/05/08 08:21:08 UTC

Lucene & TDB Question

Hi,
  I'm trying to get Lucene working with TDB,
but no luck so far.
I already
have a TDB dataset saved in "tdb_directory" folder, with the following
platform:
    Jena-2.11.1
    Lucene-4.8.0      (Apache)
    Java-1.7.0,
64-Bit
 
And I have
the following code:
Dataset ds1 = TDBFactory.createDataset(tdb_directory);
EntityDefinition entDef = new
EntityDefinition("uri", "text", RDFS.label.asNode());
File indexDir = new File(index_directory);
try{
     dir
= FSDirectory.open(indexDir);
} catch(IOException e) {
     e.printStackTrace();
}          
Dataset ds = TextDatasetFactory.createLucene(ds1,
dir, entDef);
 
Then, when I execute the query as
following:
 
ds.begin(ReadWrite.READ);
Model model = ds.getDefaultModel();
Query q = QueryFactory.create(pre + "\n"
+ qs);
QueryExecution qexec =
QueryExecutionFactory.create(q, ds);
QueryExecUtils.executeQuery(q,
qexec);               
                
ds.commit();
ds.end();
 
I get an empty table 
-------------
| s | label |
=============
 
and in the index_directory, only 3
file were created; segments.gen, segments_1, and write.lock, with sizes 1kb,
1kb, 0kb, respectively.
 
I'm not sure what I'm missing here,
and really appreciate any help. 
 
Thanks in advance.
 
Karen

Re: Lucene & TDB Question

Posted by Harkishan Singh <ha...@gmail.com>.

Hi,
I used these commands for indexing the data.

Loading data to tdb :
java -cp %FUSEKI_HOME%/fuseki-server.jar tdb.tdbloader
--tdb="Path of assembler file"    "path of file to be indexed"


Indexing Predicates:
java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc="path of
assembler file"

After Indexing use this code:

TextQuery.init();
        Dataset dataset = null;
        Dataset tdbDataset = TDBFactory.createDataset(tdbFilePath);
        try {
            EntityDefinition entDef = new EntityDefinition("uri", "text",
RDFS.label);
            Directory dir = FSDirectory.open(new File(luceneFilePath));
            dataset = TextDatasetFactory.createLucene(tdbDataset, dir,
entDef);

       dataset .begin(ReadWrite.READ);
>       Model model = dataset .getDefaultModel();
>       Query q = QueryFactory.create(pre + "\n"
>      + qs);
>

// here qs is the query String (Sparql Query)

QueryExecution qexec =QueryExecutionFactory.create(q, ds);
> QueryExecUtils.executeQuery(q,
> qexec);
>
> dataset .commit();
> dataset .end();


        } catch (Exception e) {
            System.out.println(e.toString());
        }


Thanks



On Thu, May 15, 2014 at 7:00 PM, Andy Seaborne <an...@apache.org> wrote:

> On 08/05/14 07:21, Karen Menz wrote:
>
>> Hi,
>>    I'm trying to get Lucene working with TDB,
>> but no luck so far.
>> I already
>> have a TDB dataset saved in "tdb_directory" folder, with the following
>> platform:
>>      Jena-2.11.1
>>      Lucene-4.8.0      (Apache)
>>      Java-1.7.0,
>> 64-Bit
>>
>> And I have
>> the following code:
>> Dataset ds1 = TDBFactory.createDataset(tdb_directory);
>> EntityDefinition entDef = new
>> EntityDefinition("uri", "text", RDFS.label.asNode());
>> File indexDir = new File(index_directory);
>> try{
>>       dir
>> = FSDirectory.open(indexDir);
>> } catch(IOException e) {
>>       e.printStackTrace();
>> }
>> Dataset ds = TextDatasetFactory.createLucene(ds1,
>> dir, entDef);
>>
>> Then, when I execute the query as
>> following:
>>
>> ds.begin(ReadWrite.READ);
>> Model model = ds.getDefaultModel();
>> Query q = QueryFactory.create(pre + "\n"
>> + qs);
>>
>
> What's 'qs'?
>
>  QueryExecution qexec =
>> QueryExecutionFactory.create(q, ds);
>> QueryExecUtils.executeQuery(q,
>> qexec);
>>
>> ds.commit();
>> ds.end();
>>
>> I get an empty table
>> -------------
>> | s | label |
>> =============
>>
>> and in the index_directory, only 3
>> file were created; segments.gen, segments_1, and write.lock, with sizes
>> 1kb,
>> 1kb, 0kb, respectively.
>>
>> I'm not sure what I'm missing here,
>> and really appreciate any help.
>>
>
> looks like the data isn't indexed.  It does not happen automatically just
> by atatching and index to an existing, preloaded dataset.
>
> Either have the index attached to the dataset when you loaded the data or
> build the dataset in two steps:
>
> http://jena.apache.org/documentation/query/text-
> query.html#building-a-text-index
>
> There is a textindexer to run from the command line for indexing existing
> data.
>
>         Andy
>
>
>
>
>
>> Thanks in advance.
>>
>> Karen
>>
>>
>

Re: Lucene & TDB Question

Posted by Harkishan Singh <ha...@gmail.com>.

Hi

If you want to index data by using java program then follow this code:

One thing keep in mind, if you are dealing with huge rdf data then it might
give you a Memory exception because it holds everything in-memory and then
indexes it. So for huge rdf data use command lines.

public static void main(String[] argv) {
        TextQuery.init();
        Dataset ds = createCode();
        Dataset ds = createAssembler() ;
        loadData(ds, "file path");
        queryData(ds);
    }


    public static Dataset createCode() {
        Dataset ds = null;
        System.out.println("Construct an in-memory dataset with in-memory
lucene index using code");
        // Build a text dataset by code.
        // Here , in-memory base data and in-memeory Lucene index

        // Base data
        // Dataset ds1 = DatasetFactory.createMem();
        String directory = "path of TDB";
        Dataset ds1 = TDBFactory.createDataset(directory);


        try {
            // Define the index mapping
            EntityDefinition entDef = new EntityDefinition("uri",
"property", RDFS.label);
            // Lucene, in memory.
             Directory dir = new RAMDirectory();

            // Join together into a dataset
            ds = TextDatasetFactory.createLucene(ds1, dir, entDef);


        } catch (Exception e) {
            System.out.println(e.toString());
        }
        return ds;
    }


     public static void loadData(Dataset dataset, String file) {
        System.out.println("Start loading");
        long startTime = System.nanoTime();
        dataset.begin(ReadWrite.WRITE);
        try {
            Model m = dataset.getDefaultModel();
            RDFDataMgr.read(m, file);
            dataset.commit();

        } finally {
            dataset.end();
        }

        long finishTime = System.nanoTime();
        double time = (finishTime - startTime) / 1.0e6;
        System.out.println(String.format("Finish loading - %.2fms", time));
    }

    public static void queryData(Dataset dataset) {
        System.out.println("START");
        long startTime = System.nanoTime();
        String queryString = "Sparql Query";
        dataset.begin(ReadWrite.READ);
        Model m = dataset.getDefaultModel();
        try {
            Query q = QueryFactory.create(queryString);
            QueryExecution qexec = QueryExecutionFactory.create(q, dataset);

            QueryExecUtils.executeQuery(q, qexec);
        } finally {
            dataset.end();
        }
        long finishTime = System.nanoTime();
        double time = (finishTime - startTime) / 1.0e6;
        System.out.println(String.format("FINISH - %.2fms", time));

    }

Thanks


On Tue, May 20, 2014 at 1:49 PM, Andy Seaborne <an...@apache.org> wrote:

> On 17/05/14 14:27, Karen Menz wrote:
>
>> Thanks Harkishan & Andy for your help, it works fine now.
>>
>> However, I wonder if there's a way to build the index in java code,
>> instead of the command line.
>>
>
> You can call the command line programme from java (you can call any java
> .main from java) if you want to index an already loaded dataset or just
> read data into a dataset with text index attached.
>
>         Andy
>
>
>
>> Thanks,
>> Karen
>>
>>
>>
>> On Friday, May 16, 2014 8:46 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>
>>
>> On 08/05/14 07:21, Karen Menz wrote:
>>
>>> Hi,
>>>      I'm trying to get Lucene working with TDB,
>>> but no luck so far.
>>> I already
>>> have a TDB dataset saved in "tdb_directory" folder, with the following
>>> platform:
>>>        Jena-2.11.1
>>>        Lucene-4.8.0      (Apache)
>>>        Java-1.7.0,
>>> 64-Bit
>>>
>>> And I have
>>> the following code:
>>> Dataset ds1 = TDBFactory.createDataset(tdb_directory);
>>> EntityDefinition entDef = new
>>> EntityDefinition("uri", "text", RDFS.label.asNode());
>>> File indexDir = new File(index_directory);
>>> try{
>>>         dir
>>> = FSDirectory.open(indexDir);
>>> } catch(IOException e) {
>>>         e.printStackTrace();
>>> }
>>> Dataset ds = TextDatasetFactory.createLucene(ds1,
>>> dir, entDef);
>>>
>>> Then, when I execute the query as
>>> following:
>>>
>>> ds.begin(ReadWrite.READ);
>>> Model model = ds.getDefaultModel();
>>> Query q = QueryFactory.create(pre + "\n"
>>> + qs);
>>>
>>
>> What's 'qs'?
>>
>>  QueryExecution qexec =
>>> QueryExecutionFactory.create(q, ds);
>>> QueryExecUtils.executeQuery(q,
>>> qexec);
>>>
>>> ds.commit();
>>> ds.end();
>>>
>>> I get an empty table
>>> -------------
>>> | s | label |
>>> =============
>>>
>>> and in the index_directory, only 3
>>> file were created; segments.gen, segments_1, and write.lock, with sizes
>>> 1kb,
>>> 1kb, 0kb, respectively.
>>>
>>> I'm not sure what I'm missing here,
>>> and really appreciate any help.
>>>
>>
>> looks like the data isn't indexed.  It does not happen automatically
>> just by atatching and index to an existing, preloaded dataset.
>>
>> Either have the index attached to the dataset when you loaded the data
>> or build the dataset in two steps:
>>
>> http://jena.apache.org/documentation/query/text-
>> query.html#building-a-text-index
>>
>> There is a textindexer to run from the command line for indexing
>> existing data.
>>
>>      Andy
>>
>>
>>
>>
>>
>>
>>> Thanks in advance.
>>>
>>> Karen
>>>
>>>
>

Re: Lucene & TDB Question

Posted by Andy Seaborne <an...@apache.org>.

On 17/05/14 14:27, Karen Menz wrote:
> Thanks Harkishan & Andy for your help, it works fine now.
>
> However, I wonder if there's a way to build the index in java code, instead of the command line.

You can call the command line programme from java (you can call any java 
.main from java) if you want to index an already loaded dataset or just 
read data into a dataset with text index attached.

	Andy


>
> Thanks,
> Karen
>
>
> On Friday, May 16, 2014 8:46 PM, Andy Seaborne <an...@apache.org> wrote:
>
>
>
> On 08/05/14 07:21, Karen Menz wrote:
>> Hi,
>>      I'm trying to get Lucene working with TDB,
>> but no luck so far.
>> I already
>> have a TDB dataset saved in "tdb_directory" folder, with the following
>> platform:
>>        Jena-2.11.1
>>        Lucene-4.8.0      (Apache)
>>        Java-1.7.0,
>> 64-Bit
>>
>> And I have
>> the following code:
>> Dataset ds1 = TDBFactory.createDataset(tdb_directory);
>> EntityDefinition entDef = new
>> EntityDefinition("uri", "text", RDFS.label.asNode());
>> File indexDir = new File(index_directory);
>> try{
>>         dir
>> = FSDirectory.open(indexDir);
>> } catch(IOException e) {
>>         e.printStackTrace();
>> }
>> Dataset ds = TextDatasetFactory.createLucene(ds1,
>> dir, entDef);
>>
>> Then, when I execute the query as
>> following:
>>
>> ds.begin(ReadWrite.READ);
>> Model model = ds.getDefaultModel();
>> Query q = QueryFactory.create(pre + "\n"
>> + qs);
>
> What's 'qs'?
>
>> QueryExecution qexec =
>> QueryExecutionFactory.create(q, ds);
>> QueryExecUtils.executeQuery(q,
>> qexec);
>>
>> ds.commit();
>> ds.end();
>>
>> I get an empty table
>> -------------
>> | s | label |
>> =============
>>
>> and in the index_directory, only 3
>> file were created; segments.gen, segments_1, and write.lock, with sizes 1kb,
>> 1kb, 0kb, respectively.
>>
>> I'm not sure what I'm missing here,
>> and really appreciate any help.
>
> looks like the data isn't indexed.  It does not happen automatically
> just by atatching and index to an existing, preloaded dataset.
>
> Either have the index attached to the dataset when you loaded the data
> or build the dataset in two steps:
>
> http://jena.apache.org/documentation/query/text-query.html#building-a-text-index
>
> There is a textindexer to run from the command line for indexing
> existing data.
>
>      Andy
>
>
>
>
>
>>
>> Thanks in advance.
>>
>> Karen
>>

Re: Lucene & TDB Question

Posted by Karen Menz <do...@yahoo.com>.

Thanks Harkishan & Andy for your help, it works fine now.

However, I wonder if there's a way to build the index in java code, instead of the command line.

Thanks,
Karen 


On Friday, May 16, 2014 8:46 PM, Andy Seaborne <an...@apache.org> wrote:
 


On 08/05/14 07:21, Karen Menz wrote:
> Hi,
>    I'm trying to get Lucene working with TDB,
> but no luck so far.
> I already
> have a TDB dataset saved in "tdb_directory" folder, with the following
> platform:
>      Jena-2.11.1
>      Lucene-4.8.0      (Apache)
>      Java-1.7.0,
> 64-Bit
>
> And I have
> the following code:
> Dataset ds1 = TDBFactory.createDataset(tdb_directory);
> EntityDefinition entDef = new
> EntityDefinition("uri", "text", RDFS.label.asNode());
> File indexDir = new File(index_directory);
> try{
>       dir
> = FSDirectory.open(indexDir);
> } catch(IOException e) {
>       e.printStackTrace();
> }
> Dataset ds = TextDatasetFactory.createLucene(ds1,
> dir, entDef);
>
> Then, when I execute the query as
> following:
>
> ds.begin(ReadWrite.READ);
> Model model = ds.getDefaultModel();
> Query q = QueryFactory.create(pre + "\n"
> + qs);

What's 'qs'?

> QueryExecution qexec =
> QueryExecutionFactory.create(q, ds);
> QueryExecUtils.executeQuery(q,
> qexec);
>
> ds.commit();
> ds.end();
>
> I get an empty table
> -------------
> | s | label |
> =============
>
> and in the index_directory, only 3
> file were created; segments.gen, segments_1, and write.lock, with sizes 1kb,
> 1kb, 0kb, respectively.
>
> I'm not sure what I'm missing here,
> and really appreciate any help.

looks like the data isn't indexed.  It does not happen automatically 
just by atatching and index to an existing, preloaded dataset.

Either have the index attached to the dataset when you loaded the data 
or build the dataset in two steps:

http://jena.apache.org/documentation/query/text-query.html#building-a-text-index

There is a textindexer to run from the command line for indexing 
existing data.

    Andy





>
> Thanks in advance.
>
> Karen
>

Re: Lucene & TDB Question

Posted by Andy Seaborne <an...@apache.org>.

On 08/05/14 07:21, Karen Menz wrote:
> Hi,
>    I'm trying to get Lucene working with TDB,
> but no luck so far.
> I already
> have a TDB dataset saved in "tdb_directory" folder, with the following
> platform:
>      Jena-2.11.1
>      Lucene-4.8.0      (Apache)
>      Java-1.7.0,
> 64-Bit
>
> And I have
> the following code:
> Dataset ds1 = TDBFactory.createDataset(tdb_directory);
> EntityDefinition entDef = new
> EntityDefinition("uri", "text", RDFS.label.asNode());
> File indexDir = new File(index_directory);
> try{
>       dir
> = FSDirectory.open(indexDir);
> } catch(IOException e) {
>       e.printStackTrace();
> }
> Dataset ds = TextDatasetFactory.createLucene(ds1,
> dir, entDef);
>
> Then, when I execute the query as
> following:
>
> ds.begin(ReadWrite.READ);
> Model model = ds.getDefaultModel();
> Query q = QueryFactory.create(pre + "\n"
> + qs);

What's 'qs'?

> QueryExecution qexec =
> QueryExecutionFactory.create(q, ds);
> QueryExecUtils.executeQuery(q,
> qexec);
>
> ds.commit();
> ds.end();
>
> I get an empty table
> -------------
> | s | label |
> =============
>
> and in the index_directory, only 3
> file were created; segments.gen, segments_1, and write.lock, with sizes 1kb,
> 1kb, 0kb, respectively.
>
> I'm not sure what I'm missing here,
> and really appreciate any help.

looks like the data isn't indexed.  It does not happen automatically 
just by atatching and index to an existing, preloaded dataset.

Either have the index attached to the dataset when you loaded the data 
or build the dataset in two steps:

http://jena.apache.org/documentation/query/text-query.html#building-a-text-index

There is a textindexer to run from the command line for indexing 
existing data.

	Andy




>
> Thanks in advance.
>
> Karen
>