You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2012/05/22 10:20:19 UTC

Expanding LARQ... (Was: LARQ usage, dependencies)

Hi Tao,
if you can share your requirements, that would be great. Maybe others need the
same features/capabilities and we can do something about it together.

I am not sure exactly what you mean with 'expanding LARQ'. Are you changing
LARQ's code?

LARQ is an additional jar which extends ARQ adding support for free text
searches in SPARQL queries. It can be simply added to a classpath of any Java
application using ARQ. Fuseki is one of such applications and the way you do
this with Maven is adding a dependency to LARQ in the Fuseki's pom.xml file as
shown in the patch attached to JENA-63 [1] (for example). Would that work for you?

Is it the scores you are trying to change [2]? ;-)

Paolo

 [1] https://issues.apache.org/jira/browse/JENA-63
 [2] https://issues.apache.org/jira/browse/JENA-242

Tao wrote:
> Hi Paolo,
>
> One of our index guys is working on expanding LARQ to meet our own
> requirements. However, I'm not sure how to assemble it with Fuseki when he
> finished. Can you suggest how?
>
> Thanks
> Tao


Re: Expanding LARQ... (Was: LARQ usage, dependencies)

Posted by Damian Steer <d....@bristol.ac.uk>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 23/05/12 10:55, Paolo Castagna wrote:
> Hi Tao
> 
> Tao wrote:
>> The sub query method is totally good to me. It's just our app
>> developer who doesn't like it.
> 
> Oh, well... you should/could argue a little bit more with him
> then. There is very little syntactic difference between:
> 
> { ?x pf:textMatch ( "bookxxx" 1 ex:authorOf ) .. }
> 
> and
> 
> { ?x ex:authorOf ?a . ?a pf:textMatch ("bookxxx" 1 ) . }

Or even
	
	?x ex:authorOf [ pf:textMatch ("bookxxx" 1) ]

(I think that should work?)

OTOH

?x pf:textMatch ( "bookxxx" 1 ex:authorOf )

could avoid a join across lucene and the rdf store, using a modified
indexing scheme.

Damian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+8uuwACgkQAyLCB+mTtyl0YgCfYG8+dhF2CAQZDKhgDcFdx/W9
EwcAoOojI8ntmBKAmLYxqbdAxQULLAWF
=7/aM
-----END PGP SIGNATURE-----

Re: Expanding LARQ... (Was: LARQ usage, dependencies)

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Tao

Tao wrote:
> The sub query method is totally good to me. It's just our app developer who
> doesn't like it. 

Oh, well... you should/could argue a little bit more with him then.
There is very little syntactic difference between:

{
  ?x pf:textMatch ( "bookxxx" 1 ex:authorOf ) ..
}

and

{
  ?x ex:authorOf ?a .
  ?a pf:textMatch ("bookxxx" 1 ) .
}

The first requires you change LARQ's code adding a cost to maintain your changes
in future as LARQ evolves. The second is already available at no cost (present
or future) and it provides exactly same functionalities.

Open source software is great because it gives people the ability to make little
changes and adapt the software to their specific requirements. However, every
time you make internal changes is as if you were forking the project for
internal use only. This is fine, but those changes can be a cost, so you should
think carefully and ask yourself if all are really necessary.

Paolo

RE: Expanding LARQ... (Was: LARQ usage, dependencies)

Posted by "Tao (陶信东)" <ta...@myhexin.com>.
Hi Paolo, 

Thanks for your advice. I'll try mvn install when our own LARQ is ready.,

The sub query method is totally good to me. It's just our app developer who
doesn't like it. 


Thanks
Tao

-----Original Message-----
From: Paolo Castagna [mailto:castagna.lists@googlemail.com] 
Sent: Tuesday, May 22, 2012 8:26 PM
To: users@jena.apache.org
Subject: Re: Expanding LARQ... (Was: LARQ usage, dependencies)

Hi Tao,
thanks for sharing, useful.

Tao wrote:
> Thanks Paolo,
> 
> Yes we're planning to change LARQ's code. Below are the changes we 
> want to
> make:
> 1. To replace the default tokenizer with our Chinese tokenizer.

Reasonable and in your case "mandatory"! ;-)

> 2. To replace the Lucene scores with some other similarity scores, 
> such as "med(query, literal) divided by (query. length + literal. 
> length)" (med means minimum edit distance), which ranges from 0 to 1.

We discussed about this already.

> 3. To add a property argument to the pf:textMatch function so that 
> pattern {?x pf:textMatch ("bookxxx" 1 ex:authorOf)} bounds ?x to the 
> first object of property ex:authorOf that matches "bookxxx", instead 
> of the first object of all matches. (I suggested using sparql 
> sub-queries but our sparql users prefer the extended pf:textMatch.

Can't you do it without sub queries? It's just a basic triple pattern, isn't
it?

{
  ?x ex:authorOf ?a .
  ?a pf:textMatch ("bookxxx" 1 ) .
}

What's wrong with this approach?

> As to the pom.xml patch, can we change it so that the LARQ dependence 
> point to our local LARQ.jar, instead of the official maven repository? 
> (I'm not familiar with maven)

You could run your own internal Maven repository and install your custom
LARQ version there pointing at it from your custom Fuseki.

There are plenty of good documents, for example:

 "Repository Management with Nexus"
  http://www.sonatype.com/books/nexus-book/reference/

Or, you can simply install your custom LARQ in your local Maven repository
(via mvn install) and after that, on the same machine, repackage Fuseki (via
mvn package).

But now we are going a little bit off-topic.

Paolo

> 
> 
> Thanks
> Tao
> 
> -----Original Message-----
> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
> Sent: Tuesday, May 22, 2012 4:20 PM
> To: users@jena.apache.org
> Subject: Expanding LARQ... (Was: LARQ usage, dependencies)
> 
> Hi Tao,
> if you can share your requirements, that would be great. Maybe others 
> need the same features/capabilities and we can do something about it
together.
> 
> I am not sure exactly what you mean with 'expanding LARQ'. Are you 
> changing LARQ's code?
> 
> LARQ is an additional jar which extends ARQ adding support for free 
> text searches in SPARQL queries. It can be simply added to a classpath 
> of any Java application using ARQ. Fuseki is one of such applications 
> and the way you do this with Maven is adding a dependency to LARQ in 
> the Fuseki's pom.xml file as shown in the patch attached to JENA-63 [1]
(for example).
> Would that work for you?
> 
> Is it the scores you are trying to change [2]? ;-)
> 
> Paolo
> 
>  [1] https://issues.apache.org/jira/browse/JENA-63
>  [2] https://issues.apache.org/jira/browse/JENA-242
> 
> Tao wrote:
>> Hi Paolo,
>>
>> One of our index guys is working on expanding LARQ to meet our own 
>> requirements. However, I'm not sure how to assemble it with Fuseki 
>> when he finished. Can you suggest how?
>>
>> Thanks
>> Tao
> 


Re: Expanding LARQ... (Was: LARQ usage, dependencies)

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Tao,
thanks for sharing, useful.

Tao wrote:
> Thanks Paolo,
> 
> Yes we're planning to change LARQ's code. Below are the changes we want to
> make:
> 1. To replace the default tokenizer with our Chinese tokenizer.

Reasonable and in your case "mandatory"! ;-)

> 2. To replace the Lucene scores with some other similarity scores, such as
> "med(query, literal) divided by (query. length + literal. length)" (med
> means minimum edit distance), which ranges from 0 to 1. 

We discussed about this already.

> 3. To add a property argument to the pf:textMatch function so that pattern
> {?x pf:textMatch ("bookxxx" 1 ex:authorOf)} bounds ?x to the first object of
> property ex:authorOf that matches "bookxxx", instead of the first object of
> all matches. (I suggested using sparql sub-queries but our sparql users
> prefer the extended pf:textMatch.

Can't you do it without sub queries? It's just a basic triple pattern, isn't it?

{
  ?x ex:authorOf ?a .
  ?a pf:textMatch ("bookxxx" 1 ) .
}

What's wrong with this approach?

> As to the pom.xml patch, can we change it so that the LARQ dependence point
> to our local LARQ.jar, instead of the official maven repository? (I'm not
> familiar with maven)

You could run your own internal Maven repository and install your custom LARQ
version there pointing at it from your custom Fuseki.

There are plenty of good documents, for example:

 "Repository Management with Nexus"
  http://www.sonatype.com/books/nexus-book/reference/

Or, you can simply install your custom LARQ in your local Maven repository (via
mvn install) and after that, on the same machine, repackage Fuseki (via mvn
package).

But now we are going a little bit off-topic.

Paolo

> 
> 
> Thanks
> Tao
> 
> -----Original Message-----
> From: Paolo Castagna [mailto:castagna.lists@googlemail.com] 
> Sent: Tuesday, May 22, 2012 4:20 PM
> To: users@jena.apache.org
> Subject: Expanding LARQ... (Was: LARQ usage, dependencies)
> 
> Hi Tao,
> if you can share your requirements, that would be great. Maybe others need
> the same features/capabilities and we can do something about it together.
> 
> I am not sure exactly what you mean with 'expanding LARQ'. Are you changing
> LARQ's code?
> 
> LARQ is an additional jar which extends ARQ adding support for free text
> searches in SPARQL queries. It can be simply added to a classpath of any
> Java application using ARQ. Fuseki is one of such applications and the way
> you do this with Maven is adding a dependency to LARQ in the Fuseki's
> pom.xml file as shown in the patch attached to JENA-63 [1] (for example).
> Would that work for you?
> 
> Is it the scores you are trying to change [2]? ;-)
> 
> Paolo
> 
>  [1] https://issues.apache.org/jira/browse/JENA-63
>  [2] https://issues.apache.org/jira/browse/JENA-242
> 
> Tao wrote:
>> Hi Paolo,
>>
>> One of our index guys is working on expanding LARQ to meet our own 
>> requirements. However, I'm not sure how to assemble it with Fuseki 
>> when he finished. Can you suggest how?
>>
>> Thanks
>> Tao
> 


RE: Expanding LARQ... (Was: LARQ usage, dependencies)

Posted by "Tao (陶信东)" <ta...@myhexin.com>.
Thanks Paolo,

Yes we're planning to change LARQ's code. Below are the changes we want to
make:
1. To replace the default tokenizer with our Chinese tokenizer.
2. To replace the Lucene scores with some other similarity scores, such as
"med(query, literal) divided by (query. length + literal. length)" (med
means minimum edit distance), which ranges from 0 to 1. 
3. To add a property argument to the pf:textMatch function so that pattern
{?x pf:textMatch ("bookxxx" 1 ex:authorOf)} bounds ?x to the first object of
property ex:authorOf that matches "bookxxx", instead of the first object of
all matches. (I suggested using sparql sub-queries but our sparql users
prefer the extended pf:textMatch.

As to the pom.xml patch, can we change it so that the LARQ dependence point
to our local LARQ.jar, instead of the official maven repository? (I'm not
familiar with maven)


Thanks
Tao

-----Original Message-----
From: Paolo Castagna [mailto:castagna.lists@googlemail.com] 
Sent: Tuesday, May 22, 2012 4:20 PM
To: users@jena.apache.org
Subject: Expanding LARQ... (Was: LARQ usage, dependencies)

Hi Tao,
if you can share your requirements, that would be great. Maybe others need
the same features/capabilities and we can do something about it together.

I am not sure exactly what you mean with 'expanding LARQ'. Are you changing
LARQ's code?

LARQ is an additional jar which extends ARQ adding support for free text
searches in SPARQL queries. It can be simply added to a classpath of any
Java application using ARQ. Fuseki is one of such applications and the way
you do this with Maven is adding a dependency to LARQ in the Fuseki's
pom.xml file as shown in the patch attached to JENA-63 [1] (for example).
Would that work for you?

Is it the scores you are trying to change [2]? ;-)

Paolo

 [1] https://issues.apache.org/jira/browse/JENA-63
 [2] https://issues.apache.org/jira/browse/JENA-242

Tao wrote:
> Hi Paolo,
>
> One of our index guys is working on expanding LARQ to meet our own 
> requirements. However, I'm not sure how to assemble it with Fuseki 
> when he finished. Can you suggest how?
>
> Thanks
> Tao