You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Venkat Krishnamurthy <ni...@gmail.com> on 2012/04/11 19:55:31 UTC

LARQ and fuseki/tdb

Based on trawling the list, I understand that a batch indexing process is
possible with LARQ. My requirement is to have a 'live' text index that
works with a running Fuseki instance: when updates come in,  indexed as
specified in the fuseki configuration.

Some questions:

1) How do I set up/configure LARQ with fuseki to enable live text indexing?
Can it be done purely via fuseki configuration alone?

2) Now that TDB supports transactions, can/should text indexing be done
when the actual update happens on the underlying dataset within a
transaction so that the index stays in sync with the dataset? Any other
suggestions?

VK

Re: LARQ and fuseki/tdb

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Venkat

Venkat Krishnamurthy wrote:
> Thanks Rob, Paolo.
> 
> Actually the big use case i have is to support text indexing as part of
> SPARQL updates, hence the fuseki question. It looks like configuring Fuseki
> to read the index is supported based on Paolo's response to #1 below.

We've just got the third necessary +1 to release LARQ.
Once that is done, you have the possibility to use LARQ with Fuseki,
but no updates via SPARQL Update until JENA-164 gets fixed.
Do you want to help? ;-)

This is something I need to learn myself (i.e. how to intercept SPARQL Update
requests and be notified as triples/quads get added/removed so that the index
can be updated accordingly.

> Paolo, i believe IndexWriters in Lucene are transactional - see
> https://issues.apache.org/jira/browse/LUCENE-3131, though they dont have
> general purpose XA support. I'll explore further.

Please, do and let me know.

Paolo

> 
> On Thu, Apr 12, 2012 at 8:47 AM, Paolo Castagna <
> castagna.lists@googlemail.com> wrote:
> 
>> Hi Venkat,
>> in addition to what Rob said.
>>
>> LARQ has not been released yet in Apache.
>>
>> We discussed the idea of having LARQ included in Fuseki:
>> https://issues.apache.org/jira/browse/JENA-63
>>
>> When LARQ is released, as I said, "users who want to package/include LARQ
>> with
>> Fuseki will need to checkout Fuseki, add the LARQ dependency to the Fuseki
>> pom.xml and (re)package Fuseki themselves (i.e. mvn package)".
>>
>> 1)
>>
>> Yes, LARQ can be configured via Fuseki configuration (once you have LARQ
>> in your
>> classpath) use:
>>
>> <#dataset> rdf:type tdb:DatasetTDB ;
>>  tdb:location "/path/to/your/tdb/indexes/" ;
>>  ja:textIndex "/path/to/lucene/index/" ;
>>  .
>>
>> 2)
>>
>> It's true that TDB supports transactions. However, Lucene or other free
>> text
>> indexes such as Solr or Elastic Search do not have transactions. Here, we
>> have
>> two systems with indexes, one transactional and one no. My suggestion is to
>> consider TDB, which support transactions, as the source of truth and make
>> the
>> best we can to keep the indexes in sync. But, indexes might go out of sync,
>> therefore users should be aware of that and consider the option to rebuild
>> the
>> free text indexes regularly/nightly, if possible.
>>
>> Do you have an idea or suggestion on how to keep two indexes in sync where
>> one
>> does not support transactions? Things such as the 2PC protocol might be a
>> possibility, but not without modifying Lucene (or Solr or ElasticSearch)
>> (which
>> is something I am not keen on).
>>
>> As Rob said, also, we still have an open issue for SPARQL Update requests:
>> https://issues.apache.org/jira/browse/JENA-164 ... apologies, I had no
>> time to
>> look at this recently and I am still trying to find out what's the best
>> way to
>> catch all possible route of updates: APIs, SPARQL Update & Graph Store HTTP
>> protocol, bulkloading, ... others?
>>
>> Are you update coming in directly into Fuseki or you have a central place
>> else
>> where which receives your updates?
>>
>> Thanks,
>> Paolo
>>
>> Venkat Krishnamurthy wrote:
>>> Based on trawling the list, I understand that a batch indexing process is
>>> possible with LARQ. My requirement is to have a 'live' text index that
>>> works with a running Fuseki instance: when updates come in,  indexed as
>>> specified in the fuseki configuration.
>>>
>>> Some questions:
>>>
>>> 1) How do I set up/configure LARQ with fuseki to enable live text
>> indexing?
>>> Can it be done purely via fuseki configuration alone?
>>>
>>> 2) Now that TDB supports transactions, can/should text indexing be done
>>> when the actual update happens on the underlying dataset within a
>>> transaction so that the index stays in sync with the dataset? Any other
>>> suggestions?
>>>
>>> VK
>>>
>>
> 


Re: LARQ and fuseki/tdb

Posted by Venkat Krishnamurthy <ni...@gmail.com>.
Thanks Rob, Paolo.

Actually the big use case i have is to support text indexing as part of
SPARQL updates, hence the fuseki question. It looks like configuring Fuseki
to read the index is supported based on Paolo's response to #1 below.

Paolo, i believe IndexWriters in Lucene are transactional - see
https://issues.apache.org/jira/browse/LUCENE-3131, though they dont have
general purpose XA support. I'll explore further.

On Thu, Apr 12, 2012 at 8:47 AM, Paolo Castagna <
castagna.lists@googlemail.com> wrote:

> Hi Venkat,
> in addition to what Rob said.
>
> LARQ has not been released yet in Apache.
>
> We discussed the idea of having LARQ included in Fuseki:
> https://issues.apache.org/jira/browse/JENA-63
>
> When LARQ is released, as I said, "users who want to package/include LARQ
> with
> Fuseki will need to checkout Fuseki, add the LARQ dependency to the Fuseki
> pom.xml and (re)package Fuseki themselves (i.e. mvn package)".
>
> 1)
>
> Yes, LARQ can be configured via Fuseki configuration (once you have LARQ
> in your
> classpath) use:
>
> <#dataset> rdf:type tdb:DatasetTDB ;
>  tdb:location "/path/to/your/tdb/indexes/" ;
>  ja:textIndex "/path/to/lucene/index/" ;
>  .
>
> 2)
>
> It's true that TDB supports transactions. However, Lucene or other free
> text
> indexes such as Solr or Elastic Search do not have transactions. Here, we
> have
> two systems with indexes, one transactional and one no. My suggestion is to
> consider TDB, which support transactions, as the source of truth and make
> the
> best we can to keep the indexes in sync. But, indexes might go out of sync,
> therefore users should be aware of that and consider the option to rebuild
> the
> free text indexes regularly/nightly, if possible.
>
> Do you have an idea or suggestion on how to keep two indexes in sync where
> one
> does not support transactions? Things such as the 2PC protocol might be a
> possibility, but not without modifying Lucene (or Solr or ElasticSearch)
> (which
> is something I am not keen on).
>
> As Rob said, also, we still have an open issue for SPARQL Update requests:
> https://issues.apache.org/jira/browse/JENA-164 ... apologies, I had no
> time to
> look at this recently and I am still trying to find out what's the best
> way to
> catch all possible route of updates: APIs, SPARQL Update & Graph Store HTTP
> protocol, bulkloading, ... others?
>
> Are you update coming in directly into Fuseki or you have a central place
> else
> where which receives your updates?
>
> Thanks,
> Paolo
>
> Venkat Krishnamurthy wrote:
> > Based on trawling the list, I understand that a batch indexing process is
> > possible with LARQ. My requirement is to have a 'live' text index that
> > works with a running Fuseki instance: when updates come in,  indexed as
> > specified in the fuseki configuration.
> >
> > Some questions:
> >
> > 1) How do I set up/configure LARQ with fuseki to enable live text
> indexing?
> > Can it be done purely via fuseki configuration alone?
> >
> > 2) Now that TDB supports transactions, can/should text indexing be done
> > when the actual update happens on the underlying dataset within a
> > transaction so that the index stays in sync with the dataset? Any other
> > suggestions?
> >
> > VK
> >
>
>

Re: LARQ and fuseki/tdb

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Venkat,
in addition to what Rob said.

LARQ has not been released yet in Apache.

We discussed the idea of having LARQ included in Fuseki:
https://issues.apache.org/jira/browse/JENA-63

When LARQ is released, as I said, "users who want to package/include LARQ with
Fuseki will need to checkout Fuseki, add the LARQ dependency to the Fuseki
pom.xml and (re)package Fuseki themselves (i.e. mvn package)".

1)

Yes, LARQ can be configured via Fuseki configuration (once you have LARQ in your
classpath) use:

<#dataset> rdf:type tdb:DatasetTDB ;
  tdb:location "/path/to/your/tdb/indexes/" ;
  ja:textIndex "/path/to/lucene/index/" ;
  .

2)

It's true that TDB supports transactions. However, Lucene or other free text
indexes such as Solr or Elastic Search do not have transactions. Here, we have
two systems with indexes, one transactional and one no. My suggestion is to
consider TDB, which support transactions, as the source of truth and make the
best we can to keep the indexes in sync. But, indexes might go out of sync,
therefore users should be aware of that and consider the option to rebuild the
free text indexes regularly/nightly, if possible.

Do you have an idea or suggestion on how to keep two indexes in sync where one
does not support transactions? Things such as the 2PC protocol might be a
possibility, but not without modifying Lucene (or Solr or ElasticSearch) (which
is something I am not keen on).

As Rob said, also, we still have an open issue for SPARQL Update requests:
https://issues.apache.org/jira/browse/JENA-164 ... apologies, I had no time to
look at this recently and I am still trying to find out what's the best way to
catch all possible route of updates: APIs, SPARQL Update & Graph Store HTTP
protocol, bulkloading, ... others?

Are you update coming in directly into Fuseki or you have a central place else
where which receives your updates?

Thanks,
Paolo

Venkat Krishnamurthy wrote:
> Based on trawling the list, I understand that a batch indexing process is
> possible with LARQ. My requirement is to have a 'live' text index that
> works with a running Fuseki instance: when updates come in,  indexed as
> specified in the fuseki configuration.
> 
> Some questions:
> 
> 1) How do I set up/configure LARQ with fuseki to enable live text indexing?
> Can it be done purely via fuseki configuration alone?
> 
> 2) Now that TDB supports transactions, can/should text indexing be done
> when the actual update happens on the underlying dataset within a
> transaction so that the index stays in sync with the dataset? Any other
> suggestions?
> 
> VK
> 


Re: LARQ and fuseki/tdb

Posted by Robert Vesse <rv...@yarcdata.com>.
Hi Venkat

No this is not currently possible, there is an open issue for this in JIRA - https://issues.apache.org/jira/browse/JENA-164

The current guidance is usually just to set up some kind of cron job that will periodically rebuild the index (e.g. overnight)

Sorry I can't be more help

Rob

On Apr 11, 2012, at 10:55 AM, Venkat Krishnamurthy wrote:

> Based on trawling the list, I understand that a batch indexing process is
> possible with LARQ. My requirement is to have a 'live' text index that
> works with a running Fuseki instance: when updates come in,  indexed as
> specified in the fuseki configuration.
> 
> Some questions:
> 
> 1) How do I set up/configure LARQ with fuseki to enable live text indexing?
> Can it be done purely via fuseki configuration alone?
> 
> 2) Now that TDB supports transactions, can/should text indexing be done
> when the actual update happens on the underlying dataset within a
> transaction so that the index stays in sync with the dataset? Any other
> suggestions?
> 
> VK