You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Rob Vesse <rv...@yarcdata.com> on 2012/11/20 19:46:52 UTC

User Defined Functions

Hi All

I have put in place some new functionality which I'm calling User Defined Functions – it is essentially a lightweight way for users to define new functions for use in their SPARQL queries without having to write the Java code for the function themselves.  This means it is less powerful than adding a full extension function as you can't use arbitrary Java code but it provides a simple way to encapsulate complex or large expressions into simple function calls, in essence it is an expression aliasing mechanism.

For example we can define a "square" function like so:

List<Var> args = new ArrayList<Var>();
args.add(Var.alloc("x"));
UserDefinedFunctionFactory.getFactory().add("http://example/square", "?x * ?x", args);

Then we can go ahead and use this in queries:

SELECT (<http://example/square>(3) AS ?ThreeSquared) { }

Expressions can be defined either by providing a raw expression string which conforms to the SPARQL expression syntax of by programmatically building an Expr instance.

Bear in mind that this functionality only goes so far and there are some provisos to this functionality that I should point out.

1 – Dependencies between user defined functions

It is possible to define functions that depend on other user defined functions but this is risky because if the other function definition is changed/removed your function definition may change.  To avoid this the default behavior is not to preserve your dependencies but rather to expand out the function definitions.  So for example I could define a "cube" function as follows:

List<Var> args = new ArrayList<Var>();
args.add(Var.alloc("x"));
UserDefinedFunctionFactory.getFactory().add("http://example/cube", "<http://example/square>(?x) * ?x", args);

Internally that is the same as if I defined it as follows since the definitions will be fully expanded to include the definitions of the other user defined functions used:

List<Var> args = new ArrayList<Var>();
args.add(Var.alloc("x"));
UserDefinedFunctionFactory.getFactory().add("http://example/cube", "(?x * ?x) * ?x", args);

This protects users from changing definitions, however sometimes dependencies may be desired in which case this behavior can be disabled - see UserDefinedFunctionFactory.getPreserveDependencies()

Expansion happens at definition time when you call add() so if you want to change the behavior you would need to redefine all functions that may be affected by it.

2 – Function overloading

Currently there is no support for function overloading, so if you want to define a function that has varying numbers of arguments you have to define a different URI for each variant right now.  This is something I intend to add I just haven't got round to it yet.

3 – Argument Lists

User defined functions treat all variables in the expression as arguments, if a variable is used in the expression it must be in the argument list or an error will be thrown when trying to define the function.  A variable may be in the argument list and not used in the function and this only results in a warning.

I may change the latter case to actually throw an error and move from using argument lists to ordered sets (LinkedHashSet) to prevent duplicate arguments.

4 – Overriding Function Libraries

It is possible right now to define a function that overrides an extension function provided by another function library e.g. the ARQ one.  I'm not sure whether this should be prevented or not, any thoughts?

Let me know what you think and any ideas for refinement beyond what I already listed here,

Rob

Re: User Defined Functions

Posted by Damian Steer <d....@bristol.ac.uk>.
On 24 Nov 2012, at 21:55, Andy Seaborne <an...@apache.org> wrote:

> On 20/11/12 18:46, Rob Vesse wrote:
>> Hi All
>> 
>> I have put in place some new functionality which I'm calling User
>> Defined Functions – it is essentially a lightweight way for users to
>> define new functions for use in their SPARQL queries without having
>> to write the Java code for the function themselves.  This means it is
>> less powerful than adding a full extension function as you can't use
>> arbitrary Java code but it provides a simple way to encapsulate
>> complex or large expressions into simple function calls, in essence
>> it is an expression aliasing mechanism.
> 
> Good idea.

+1

>> 
>> Let me know what you think and any ideas for refinement beyond what I already listed here,
> 
> - - - - - - -
> off the top of my head syntax but the big thing to do would be to add syntax (esp for SPARQL Update)
> 
> DEFUN my:function1(?x, ?y, ?z) =
> .... SPARQL expression ...
> ENDDEF

I tried something similar (i.e. defining functions without java) using javax.scripting, within assemblers. Has the advantage of standard syntax and no java.

Damian

Re: User Defined Functions

Posted by Andy Seaborne <an...@apache.org>.
On 20/11/12 18:46, Rob Vesse wrote:
> Hi All
>
> I have put in place some new functionality which I'm calling User
> Defined Functions – it is essentially a lightweight way for users to
> define new functions for use in their SPARQL queries without having
> to write the Java code for the function themselves.  This means it is
> less powerful than adding a full extension function as you can't use
> arbitrary Java code but it provides a simple way to encapsulate
> complex or large expressions into simple function calls, in essence
> it is an expression aliasing mechanism.

Good idea.

I tried:

UserDefinedFunctionFactory
      .getFactory().add("http://example/foo",
                         //SSE.parseExpr("(+ ?x 3)"),
                         ExprUtils.parse("?x + 3"),
                         Arrays.asList(Var.alloc("x"))
                                                     ) ;
String qs = StrUtils.strjoinNL("SELECT * {",
                                "BIND ( 5 As ?y )",
                                "BIND (<http://example/foo>(?y) AS ?z)",
                                " }") ;

Query query = QueryFactory.create(qs) ;
QueryExecution qexec =
   QueryExecutionFactory.create(query,
                                ModelFactory.createDefaultModel()) ;
QueryExecUtils.executeQuery(query, qexec) ;


>
> For example we can define a "square" function like so:
>
> List<Var> args = new ArrayList<Var>();
> args.add(Var.alloc("x"));
> UserDefinedFunctionFactory.getFactory().add("http://example/square", "?x * ?x", args);
>
> Then we can go ahead and use this in queries:
>
> SELECT (<http://example/square>(3) AS ?ThreeSquared) { }
>
> Expressions can be defined either by providing a raw expression
> string which conforms to the SPARQL expression syntax of by
> programmatically building an Expr instance.
>
> Bear in mind that this functionality only goes so far and there are
> some provisos to this functionality that I should point out.
>
> 1 – Dependencies between user defined functions
>
> It is possible to define functions that depend on other user defined
> functions but this is risky because if the other function definition
> is changed/removed your function definition may change.  To avoid
> this the default behavior is not to preserve your dependencies but
> rather to expand out the function definitions.  So for example I
> could define a "cube" function as follows:
>
> List<Var> args = new ArrayList<Var>();
> args.add(Var.alloc("x"));
> UserDefinedFunctionFactory.getFactory().add("http://example/cube", "<http://example/square>(?x) * ?x", args);

When you evaluate, do you protect the variables in the expression so 
that do not clash with bindings of the same name?  I assume you do and 
this is a function, not a macro, because of the check for variables in 
arguments.

Where the goal is a library, functions are better than macros IMHO.

(earlier "expression aliasing mechanism" suggested macros with named 
arguments but access to current binding)

Actually, looking at the code, there is scoping of the variables inside 
the fucntion because you substitute function variables from the 
arguments then call eval.

> Internally that is the same as if I defined it as follows since the
> definitions will be fully expanded to include the definitions of the
> other user defined functions used:
>
> List<Var> args = new ArrayList<Var>();
> args.add(Var.alloc("x"));
> UserDefinedFunctionFactory.getFactory().add("http://example/cube", "(?x * ?x) * ?x", args);
>
> This protects users from changing definitions, however sometimes
> dependencies may be desired in which case this behavior can be
> disabled - see UserDefinedFunctionFactory.getPreserveDependencies()
>
> Expansion happens at definition time when you call add() so if you
> want to change the behavior you would need to redefine all functions
> that may be affected by it.

and also the order of definition matters (unless you've been clever and 
delayed the expansion of <cube> in the <square> isn't defined yet.

I'm not sure that undefining functions is that likely but materializing 
expressions early is prudent.

> 2 – Function overloading
>
> Currently there is no support for function overloading, so if you
> want  to define a function that has varying numbers of arguments you have to
> define a different URI for each variant right now. This is something I
> intend to add I just haven't got round to it yet.
>
> 3 – Argument Lists
>
> User defined functions treat all variables in the expression as
> arguments, if a variable is used in the expression it must be in the
> argument list or an error will be thrown when trying to define the
> function. A variable may be in the argument list and not used in the
> function and this only results in a warning.

I prefer this style - checking the expression against the arguments
(i.e. it's a strict function, not a macro).  Better for reusability 
across queries.

> I may change the latter case to actually throw an error and move
> from  using argument lists to ordered sets (LinkedHashSet) to prevent
> duplicate arguments.

Unless you are expecting huge argument lists, checking a list for 
uniqueness isn't too bad.

>
> 4 – Overriding Function Libraries
>
> It is possible right now to define a function that overrides an
> extension function provided by another function library e.g. the ARQ
> one.  I'm not sure whether this should be prevented or not, any
> thoughts?

A nice thing to block it happening (destabilizes the system) but not 
something I myself would worry about.

>
> Let me know what you think and any ideas for refinement beyond what I already listed here,

Should
UserDefinedFunctionFactory

be in the API pachake c.h.h.j.query?

Documentation on the web site?
Examples in src-examples/?

- - - - - - -
off the top of my head syntax but the big thing to do would be to add 
syntax (esp for SPARQL Update)

DEFUN my:function1(?x, ?y, ?z) =
.... SPARQL expression ...
ENDDEF

(not sure ENDDEF is needed for an expression because anythign else 
starts with a keyword)

and also:

LET TABLE table57 = { ... pattern ... }

which evaluated it there and then,

and use in query / update as :

TABLE(table57)

DEFTABLE delays evaluation until used (so it is a macro - refers to 
external variables)

	Andy