You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Deepak Kapoor <dk...@gmail.com> on 2018/08/23 23:18:21 UTC

Would you expect coalesce to return a result when none of the columns in coalesce exist in the parquet file ??

Would you expect coalesce to return a result when none of the columns in coalesce exist in the parquet file ??

Query 1:
select coalesce(cert_statusdd,cert_status33) from dfs.`/tmp/parquetfiles`;
Error: SYSTEM ERROR: CompileException: Line 56, Column 27: Assignment conversion not possible from type “org.apache.drill.exec.expr.holders.NullableIntHolder” to type “org.apache.drill.exec.vector.UntypedNullHolder”

Fragment 1:0

[Error Id: 7b9193fb-289b-4fbf-a52a-2b93b01f0cd0 on dkvm2c:31010] (state=,code=0)

In above case both the columns do not exist.  Is the error reported a valid error or is it a bug.

Another observation is that if you use CAST on the column that does not exist, Coalesce will not error out instead return nulls. 
Query 2:
select coalesce(cast(cert_statusdd as varchar),cast(cert_statusee as varchar)) from dfs.`/tmp/parquetfiles` limit 3;
+---------+
| EXPR$0  |
+---------+
| null    |
| null    |
| null    |
+————+

Note: Databases like postgresql and mysql will not allow COALESCE on non existing column, w or without CAST. 
Drill however being schema free, do we expect COALESCE to succeed and give null as result ?? In which case the SYSTEM ERROR: CompileException with the 1st Quert  be a bug. ??

Thanks.
- Deepak (dkapoor@mapr.com <ma...@mapr.com>)


Re: Would you expect coalesce to return a result when none of the columns in coalesce exist in the parquet file ??

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Deepak,

Drill is "schema-free" (really, schema-on-read). This means Drill behaves differently for missing columns than traditional schema-on-write databases. In particular, if a reader does not find the requested column in the input file, Drill will invent a nullable int column and fill it with nulls. (Drill is optimistic; it hopes that some of your files has a value for the column, even if some files do not.)

In your first example, you can see the nullable int in the "NullableIntHolder". Apparently that type conflicts with an expected type of an untyped null holder. (The untyped null holder must be new; I hadn't seen that previously.) Not sure where the "UntypedNullHolder" came from.

I would file a JIRA ticket for the first case. The documentation says you can provide any scalar type to COALESCE. I'd suppose they have to be of the same type, but the documentation does not say so. Certainly COALESCE should accept an untyped null. The output of your example should have been a null value of type nullable int.

In the second case, the cast converts the nullable int to a nullable varchar. The output you see is expected: the NULL value of type nullable int is converted to a null value of type nullable varchar, and, according to the documentation, "If all expressions evaluate to null, then the COALESCE function returns null." 
Thanks,
- Paul

 

    On Thursday, August 23, 2018, 5:14:02 PM PDT, Deepak Kapoor <dk...@gmail.com> wrote:  
 
 Would you expect coalesce to return a result when none of the columns in coalesce exist in the parquet file ??

Query 1:
select coalesce(cert_statusdd,cert_status33) from dfs.`/tmp/parquetfiles`;
Error: SYSTEM ERROR: CompileException: Line 56, Column 27: Assignment conversion not possible from type “org.apache.drill.exec.expr.holders.NullableIntHolder” to type “org.apache.drill.exec.vector.UntypedNullHolder”

Fragment 1:0

[Error Id: 7b9193fb-289b-4fbf-a52a-2b93b01f0cd0 on dkvm2c:31010] (state=,code=0)

In above case both the columns do not exist.  Is the error reported a valid error or is it a bug.

Another observation is that if you use CAST on the column that does not exist, Coalesce will not error out instead return nulls. 
Query 2:
select coalesce(cast(cert_statusdd as varchar),cast(cert_statusee as varchar)) from dfs.`/tmp/parquetfiles` limit 3;
+---------+
| EXPR$0  |
+---------+
| null    |
| null    |
| null    |
+————+

Note: Databases like postgresql and mysql will not allow COALESCE on non existing column, w or without CAST. 
Drill however being schema free, do we expect COALESCE to succeed and give null as result ?? In which case the SYSTEM ERROR: CompileException with the 1st Quert  be a bug. ??

Thanks.
- Deepak (dkapoor@mapr.com <ma...@mapr.com>)