You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2023/04/12 18:04:00 UTC

[jira] [Commented] (CALCITE-5644) Implement BigQuery CONTAINS_SUBSTR

    [ https://issues.apache.org/jira/browse/CALCITE-5644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711504#comment-17711504 ] 

Julian Hyde commented on CALCITE-5644:
--------------------------------------

The example makes it look a little more complicated than it really is - as if {{CONTAINS_SUBSTR}} can accept tables or rows as input. Imagine there were a {{CONTAINS(expr, seek)}} function that operates on two string arguments and expands to {{POSITION(seek IN expr) > 0}}. I think of {{CONTAINS_SUBSTR}} as a slight generalization of that function to accept any data type (including {{ARRAY}}, {{ROW}} and {{STRUCT}}).

So {{CONTAINS_SUBSTR}} is still a scalar function; much simpler to think about than a table function.

I'll note that the implementation of {{CONTAINS_SUBSTR}} may involve using full-text indexes, and therefore the query plan will have a very different shape than if it were not present. The function is certainly not evaluated for every row. But those implementation details are hidden from the user (except that the plan runs faster and takes fewer $ and kWh).

> Implement BigQuery CONTAINS_SUBSTR
> ----------------------------------
>
>                 Key: CALCITE-5644
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5644
>             Project: Calcite
>          Issue Type: Task
>            Reporter: Tanner Clary
>            Assignee: Tanner Clary
>            Priority: Major
>
> BigQuery offers the [CONTAINS_SUBSTR|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#contains_substr] function that returns {{TRUE}} if a substring is present in an expression and {{FALSE}} if it is not. A basic example of this may be seen in [1]. 
> The expression can take many forms (more info in the linked doc) which makes its implementation more complex than other string functions that only accept arguments of type {{STRING}}. For instance, the expression to be searched can be a column or table reference. 
> The function also has an optional third argument called {{json_scope}} where the user can indicate the scope of JSON data (keys, values, or both) to be searched for the substring.
>  I am curious if anyone has thoughts on how the search of rows or tables could be implemented. I have a basic implementation (that supports expressions of type {{STRING}} and nothing else) that I will open a draft PR for as a starting point. To me, the challenge is implementing the additional features like the {{JSON_SCOPE}} argument (seen in [2]) and performing a cross field search as seen in [3]. 
> [1] {{SELECT CONTAINS_SUBSTR("hello", "he");}} would return {{TRUE}}. 
> [2] SELECT CONTAINS_SUBSTR(JSON '("lunch":"soup")', "lunch",
>          json_scope=>"JSON_VALUES") AS result;
> would return {{FALSE}}.
> [3] 
> {{SELECT *
> FROM Recipes
> WHERE CONTAINS_SUBSTR(
>   (SELECT AS STRUCT Recipes.* EXCEPT (Lunch, Dinner)),
>   'potato'
> );}} would return: 
> ||Breakfast||Lunch||Dinner||
> |Potato pancakes|Toasted cheese sandwich|Beef stroganoff|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)