You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Cyril de Catheu <cy...@startree.ai.INVALID> on 2022/07/20 19:57:30 UTC

Understanding the MULTISET constructor

Hello Calcite community,

I'm trying to hardcode a MULTISET manually in a query.
I could not find a way to do this in the doc about value constructors
<https://calcite.apache.org/docs/reference.html>.
But I saw the row/map/array constructors (eg ARRAY/[val1, val2, ...]) and I
found that MULTISET[val1, ...] seems to work.

Q1 - Is this the correct way to create a MULTISET?

When creating a multiset this way, I have a strange behavior with Strings.
Q2 - It seems the strings are padded to the longest string in the MULTISET
? Is this the expected behavior?

Eg MULTISET["a", "abcd" ] becomes MULTISET["a   ", "abcd" ]. (notice the
spaces after a)

To reproduce in sqline: (the connection used is not important: this is just
a copy/paste of the tutorial)

cd calcite/example/csv
./sqlline
!connect jdbc:calcite:model=src/test/resources/model.json admin admin

select
MULTISET['dev', 'a'] MULTISET INTERSECT MULTISET['lol','dev'],
MULTISET['dev', 'ab'] MULTISET INTERSECT MULTISET['lol','dev'],
MULTISET['dev', 'abc'] MULTISET INTERSECT MULTISET['lol','dev'],
MULTISET['dev', 'abcd'] MULTISET INTERSECT MULTISET['lol','dev'],
MULTISET['dev', 'abcde'] MULTISET INTERSECT MULTISET['lol','dev']
;

+--------+--------+--------+--------+--------+
| EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 |
+--------+--------+--------+--------+--------+
| [dev]  | [dev]  | [dev]  | []     | []     |

--> intersection is broken due to the padding

Looking at debug logs I do see the padding:
org.apache.calcite.sql.parser - Reduced MULTISET['dev', 'abcde'] MULTISET
INTERSECT ALL MULTISET['lol', 'dev'] *#dev is not padded yet*
org.apache.calcite.sql.parser - Reduced (MULTISET['dev', 'abcde'] MULTISET
INTERSECT ALL MULTISET['lol', 'dev']) IS NOT EMPTY
org.apache.calcite.sql2rel - Plan after converting SqlNode to RelNode
LogicalProject(_ID=[$0])
  LogicalFilter(condition=[IS NOT EMPTY(MULTISET INTERSECT ALL($SLICE($3),
$SLICE($4)))])
    LogicalJoin(condition=[true], joinType=[inner])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalTableScan(table=[[objects, objects]])
        Collect(field=[EXPR$0])
          LogicalValues(tuples=[[{ 'dev  ' }, { 'abcde' }]])  *#dev is
padded with space here*
      Collect(field=[EXPR$0])
        LogicalValues(tuples=[[{ 'lol' }, { 'dev' }]])

Am I missing something?

Thanks for reading.
Cyril de Catheu

-- 

[image: StarTree] <https://startree.ai>
Cyril de Catheu
Software Engineer
+33 684-829-908 <+33+684-829-908>
Follow us: [image: RSS] <https://www.startree.ai/blogs>[image: LinkedIn]
<https://www.linkedin.com/in/cyril-de-catheu/>[image: Twitter]
<https://twitter.com/startreedata>[image: Slack]
<https://stree.ai/slack>[image:
YouTube] <https://youtube.com/StarTreeData>

[image: Try StarTree Cloud Today]
<https://get.startree.ai/startree-cloud?utm_campaign=byoc-edition-of-startree-cloud&utm_source=email&utm_content=startree-employee-email-signatures>