You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/17 02:55:10 UTC

[GitHub] [druid] paul-rogers opened a new pull request, #13106: Query test case definitions

paul-rogers opened a new pull request, #13106:
URL: https://github.com/apache/druid/pull/13106

   As we move forward with the catalog and other projects, it has become clear we need a SQL query test structure which is a bit more flexible than the current "Calcite tests."
   
   ### Test Case File
   
   The revised framework, inspired by the one used by Apache Impala, uses "test case" file to define the test inputs and expected outputs. Each ".case" file contains one or more test cases, each of which has one or more sections. The file uses a somewhat unusual syntax to avoid the need to quote JSON or SQL. (It would seem to be more of a "Druid fit" to use JSON, but imagine the complexity of quoting SQL that contains JSON within a JSON-formatted file.)
   
   A typical test case looks like this:
   
   ```text
   ==============================================================
   Converted from testSelectConstantArrayExpressionFromTable()
   === case
   SELECT constant array expression from table
   === SQL
   SELECT ARRAY[1,2] as arr, dim1 FROM foo LIMIT 1
   === options
   sqlCompatibleNulls=both
   vectorize=true
   === schema
   arr INTEGER ARRAY
   dim1 VARCHAR
   === plan
   LogicalSort(fetch=[1])
     LogicalProject(arr=[ARRAY(1, 2)], dim1=[$2])
       LogicalTableScan(table=[[druid, foo]])
   === native
   {
     "queryType" : "scan",
     "dataSource" : {
       "type" : "table",
       "name" : "foo"
     },
     "intervals" : {
       "type" : "intervals",
       "intervals" : [ "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z" ]
     },
     "virtualColumns" : [ {
       "type" : "expression",
       "name" : "v0",
       "expression" : "array(1,2)",
       "outputType" : "ARRAY<LONG>"
     } ],
     "resultFormat" : "compactedList",
     "limit" : 1,
     "columns" : [ "dim1", "v0" ],
     "legacy" : false,
     "granularity" : {
       "type" : "all"
     }
   }
   === results
   ["[1,2]",""]
   ```
   
   The structure has multiple sections as described by the `syntax.md` file in this PR. The primary sections are `sql`, which defines the query, and `results` which defines the expected output as a set of JSON lines. The other sections allow validating other parts of the query process such as the logical plan, the output schema, the Druid native query and so on.
   
   The `context` section specifies any additional query context items to include in the query. The `options` section provides instructions to the test framework itself, such as whether to run a query vectorized or not.
   
   ### Scope of This PR
   
   This PR includes only the test case definitions: both for for the expected values (parsed from a ".case" file), and an actual query run. This PR does not yet include the mechanisms to run the cases. That mechanism is dependent on various other in-flight PRs and will arrive in a future PR. This PR allows us to establish the case file format itself which can be used in both Calcite unit tests and new ITs.
   
   <hr>
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1284550899

   @abhishekagarwal87, the catalog project makes changes to the query plan and I need to check the plan and the signature rather than the native query. That code was added to the Java framework recently, so it is no longer a compelling reason to adopt a new framework. There was lukewarm interest in the file format, though I did find it worked great during my time with Impala. There was also not much interest in converting existing tests, or using the same tests for other test situations (ITs, QA-style tests. MSQ, etc.), so there seemed no compelling reason to push this forward when we have many other things on our plate.
   
   All that said, we can revisit if a new need arises for which this approach would save us time. The code is here when we need it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] kfaraz commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

kfaraz commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1260415405

Haven't reviewed the PR thoroughly yet but it seems pretty useful and would make test cases much more readable.

A couple of first glance comments:
- I suppose the only use of `TestCaseWriter` would be to convert existing tests as new tests would always be written in the new format. I suppose you would call this writer from somewhere inside `BaseCalciteQueryTest.testQuery()`, run all the tests and print out all the test cases in one go. Please correct me if I am missing something.
- I would advise using yaml rather than having to define a whole new syntax. Yaml is meant to tackle such requirements, you can easily have multi-line strings, even embed json data, without compromising on the format or having to unnecessarily quote things.
- I would also advise removing the suffix `...Section` from different objects that make up a test case. I feel that the decision to write things in a "section" or otherwise is really an impl detail. It would be cognitively simpler if a test case just specifies `Resources`, `Context`, `ExpectedResults` etc., being completely agnostic of the underlying file format.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1250398079

   This build is clean except for an unrelated IT Parquet compile error described in [Issue #13116](https://github.com/apache/druid/issues/13116).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1263881334

   @kfaraz, went ahead and renamed the "semanticized" objects to not use the `Section` term. The parsed nodes are still sections. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1276843212

   As an aside, the proposed format is derived from the similar, but simpler format which [Impala uses](https://github.com/apache/impala/blob/master/testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1276829856

   By popular demand, here is the example from the PR description converted to YAML:
   
   ```yaml
   cases:
     - name:  SELECT constant array expression from table
       description: Converted from testSelectConstantArrayExpressionFromTable()
       input:
         sql: |
           SELECT ARRAY[1,2] as arr, dim1 
           FROM foo 
           LIMIT 1
   
         options:
           sqlCompatibleNulls: both
           vectorize: true
       expected:
         schema:
           - name: arr
             type: INTEGER ARRAY
           - name: dim1 
             Type: VARCHAR
         plan: |
           LogicalSort(fetch=[1])
             LogicalProject(arr=[ARRAY(1, 2)], dim1=[$2])
               LogicalTableScan(table=[[druid, foo]])
   
         native: |
           {
             "queryType" : "scan",
             "dataSource" : {
               "type" : "table",
               "name" : "foo"
             },
             "intervals" : {
               "type" : "intervals",
               "intervals" : [ "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z" ]
             },
             "virtualColumns" : [ {
               "type" : "expression",
               "name" : "v0",
               "expression" : "array(1,2)",
               "outputType" : "ARRAY<LONG>"
             } ],
             "resultFormat" : "compactedList",
             "limit" : 1,
             "columns" : [ "dim1", "v0" ],
             "legacy" : false,
             "granularity" : {
               "type" : "all"
             }
           }
   
         results:
           - ["[1,2]",""]
   ```
   
   Some thoughts:
   
   * It works. Validated with [Yamllint](http://www.yamllint.com/).
   * Easier to handle the structured bits (name, description, options, context).
   * The fact that literal text has to be indented is a nuisance.
   * The result does not really solve the "wall of text" issue that folks raised about the proposed format.
   
   One alternative is to use YAML for the structured stuff, text for the other stuff. But, that ends up pretty much with the format  that is proposed with a few less sections:
   
   ```text
   ==============================================================
   name:  SELECT constant array expression from table
   description: Converted from testSelectConstantArrayExpressionFromTable()
   options:
       sqlCompatibleNulls: both
       vectorize: true
   === SQL
   ...
   === schema
   ...
   === plan
   ...
   === native
   ...
   ```
   
   Here, 4 (or more) equals signs introduces a new test case, where the first section is YAML. The other sections are as defined before. I'm not sure this is a huge win.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1263882547

   Changed the representation of options, context and parameters to be more JSON-like to avoid the need for a specialized parser and syntax. See the `syntax.md` file for the details. Updated the examples in these comments to match.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] abhishekagarwal87 commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

abhishekagarwal87 commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1283608450

   @paul-rogers - what is that solution? I feel the framework is useful since it 
    - lets us run same tests on multiple engines 
    - makes it easy to write new tests so text-to-java conversion is not needed. 
   
   can the new solution achieve these two objectives? 
   
   FWIW, I prefer non-YAML syntax since we are going to put a bunch of text as well as JSON in there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers closed pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers closed pull request #13106: Query test case definitions
URL: https://github.com/apache/druid/pull/13106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1282796462

   @clintropolis, thanks for the not on YAML. I would find the YAML format tedious, if only because of the extra spaces.
   
   So, maybe we can live with the current Java-based format. Converting this to a draft for now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

clintropolis commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1278221455

   thanks for providing a yaml example.
   
   fwiw I think I actually prefer the yaml, at least thinking about how it would look to see a test case file with a large number of tests (assuming we keep them thematically grouped like "join tests" and "array tests"), and being standard yaml seems a benefit to me since lots of things can understand yaml, but I can probably be convinced otherwise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1263118482

@kfaraz, thanks for your comments. Here are a few answers.

The `TestCaseWriter` class is, indeed, part of the code that can convert existing tests. There are some changes to the `BaseCalciteQuery` and `DruidPlanner` classes that capture the items we might want to capture. There is also some code that munges the results for things like vectorize-or-not and other options. The result is that existing tests can be converted by enabling the conversion step, running the existing steps, then saving the result as the new "expected" values.

The vast majority of tests convert without issue. A few are fiddly, or do special things. These can be left in Java, or can use the "advanced" tricks such as test options or copying selected parts of tests.

The code to do the conversion is part of the next PR; though it has been suggested that we combine that code with this PR to give the reviewers the full picture (at the cost of more code to review.)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1282831454

   Thinking about this more: the project that drove the need for enhanced planner testing has found another solution (since it has taken many months to get this new framework reviewed.) At this point, there is no compelling reason for the proposed format. Closing the PR until we have some other need, which will better shape our discussion about the design.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #13106: Query test case definitions

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on PR #13106:
URL: https://github.com/apache/druid/pull/13106#issuecomment-1263124969

   @kfaraz, thanks for the suggestion to use YAML. We did, in fact, use YAML for the config files in the new ITs. YAML was not used here because we're mostly dealing with large blocks of text (except for query context variables and test options.) While YAML can handle such text blocks, it's not pretty or simple. (I say this having used YAML to create Helm charts for K8s: a task I wouldn't wish on anyone!)
   
   The goal of the odd format is for test cases to be mostly text, with simple delimiters between sections, hence the odd "===" syntax which is meant to look like a divider.
   
   Here's a quick comparison. Proposed format:
   
   ```text
   ==================
   My Test Case
   === SQL
   SELECT *
   FROM foo
   === schema
   A VARCHAR
   B BIGINT
   ```
   
   YAML:
   
   ```yaml
   testCases:
     - testCase: My Test Case
       sql: |-
         SELECT *
         FROM FOO
       schema:
         - name: A
           type: VARCHAR
         - name: B
           type: BIGINT
   ```
   
   Imagine encoding a native query, as JSON, within a YAML file. Would get to be about as hairy as using JSON in SQL in MSQ to specify an external file.
   
   All that said, you suggest a possible code change: ensure that the parsed "nodes" are agnostic about the file format. That way, we could try a YAML solution which would be converted to the same parse nodes as the current ad-hoc. format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org