You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2016/06/01 18:28:59 UTC

[jira] [Commented] (CASSANDRA-10783) Allow literal value as parameter of UDF & UDA

    [ https://issues.apache.org/jira/browse/CASSANDRA-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310834#comment-15310834 ] 

Sylvain Lebresne commented on CASSANDRA-10783:
----------------------------------------------

Actually, I think we should deal with this before dealing with CASSANDRA-7396 since the latter basically needs terms in selections and that's what this introduce.

There is a few things that I think the patch doesn't deal properly with however:
* It sets a {{null}} name and type in the {{ColumnSpecification}} of markers.  That just doesn't work. The only reason unit test don't blow up is that they don't actually exercise serializing the resulting metadata but said serialization would NPE. Besides, clients need the type to know what to send.
* The way the grammar is written is imo too restrictive. It doesn't allow collection literals inside function calls for instance ({{f({ 'foo' : 'bar' })}} doesn't even parse (it parses if you add a cast before the literal, but that shouldn't be required)), which feels unecesssarily restrictive. Further, I'm generally not too fan of trying to eliminate too much through the parser as the error message ends up being confusing imo. For instance, while {{SELECT 1 FROM ...}} or {{SELECT f\(?\) FROM ...}} is allowed, {{SELECT ? FROM ...}} would throw a parsing exception. Don't get me wrong, {{SELECT ? FROM ...}} is something we have to reject, but I'd rather reject it with a clear message saying that we can't infer the type and thus require a cast.
* The fact we only prepare and bind the term in {{TermSelector.getOutput()}} is a bit silly, as it means we'll redo said preparation/binding work (which, with functions can involve computation) for every output row even if it's constant once bound. In general, I feel that the right place to prepare/bind the terms is in the {{Selector.Factory.newInstance()}} method (which should thus take the {{QueryOptions}}).

Now, fixing those points require quite a few changes and I took a stab at it in the patch attached below. A few points worth noting on that patch:
* It makes "json" and "distinct" invalid (user) function names. It was impossible for the parser to distinguish if {{ SELECT JSON (1, 2, 3) FROM ..}} was a {{SELECT JSON}} followed by a tuple-literal, or a call to a user-defined "json" function with 3 ints arguments (same for distinct). We could, alternatively, forbid literals for {{DISTINCT}} and {{JSON}}, thus forcing the parser the recognize the query above as function calls, but if anything, that feels like the inverse of what users would expect. This would also complicate the parser and that doesn't feel worth it. Allowing those as valid UDF names just feels like a mistake in the first place and I'd rather fix it now.
* It makes ColumnDefinition a Selectable (instead of ColumnIdentifier), which is both needed by the changes so we can easily get the type of a Selectable (when possible) but also make sense restrospectively. This does change a few error messages by having the check if a given column exists in only one place instead of scattered around. It also switch to ByteBuffer to handle UDT fields instead of ColumnIdentifier, mostly because the latter was not imo a good idea.
* We do have to special case {{COUNT(1)}} for now, but I've simplified that special casing a bit.
* It reuses the tests from Robert's patch, but with some modifications to make sure, for instance, that the metadata for bind markers is properly set.
* There is actually 2 commits: the first one allow things like {{SELECT 1 FROM ...}}, defaulting {{1}} to be of type {{varint}}, much like in Robert's patch.  Thinking about it though, I had a slight change of heart and removed that behavior, making the query throw an error (saying it can't infer the exact type). The reason is that I'm bothered with defaulting to {{varint}} since it's partly random and not always the most convenient. So I figured, shouldn't we take some more time to think about this and left it out initially. After all, it's not really a big deal, {{SELECT 1 FROM...}} is not a very useful query and you can always do {{SELECT (int)1 FROM ...}} if you really want. But if people feel confident enough that defaulting to {{varint}} is the only sensible choice and should be allowed now, then allowing it is as easy as ignoring the 2nd commit on the branch.

|| [trunk|https://github.com/pcmanus/cassandra/commits/10783] || [utests|http://cassci.datastax.com/job/pcmanus-10783-testall/] || [dtests|http://cassci.datastax.com/job/pcmanus-10783-dtest/] ||

> Allow literal value as parameter of UDF & UDA
> ---------------------------------------------
>
>                 Key: CASSANDRA-10783
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10783
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: DOAN DuyHai
>            Assignee: Robert Stupp
>            Priority: Minor
>              Labels: CQL3, UDF, client-impacting, doc-impacting
>             Fix For: 3.x
>
>
> I have defined the following UDF
> {code:sql}
> CREATE OR REPLACE FUNCTION  maxOf(current int, testValue int) RETURNS NULL ON NULL INPUT 
> RETURNS int 
> LANGUAGE java 
> AS  'return Math.max(current,testValue);'
> CREATE TABLE maxValue(id int primary key, val int);
> INSERT INTO maxValue(id, val) VALUES(1, 100);
> SELECT maxOf(val, 101) FROM maxValue WHERE id=1;
> {code}
> I got the following error message:
> {code}
> SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:19 no viable alternative at input '101' (SELECT maxOf(val1, [101]...)">
> {code}
>  It would be nice to allow literal value as parameter of UDF and UDA too.
>  I was thinking about an use-case for an UDA groupBy() function where the end user can *inject* at runtime a literal value to select which aggregation he want to display, something similar to GROUP BY ... HAVING <filter clause>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)