You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Timothy Farkas <tf...@mapr.com> on 2018/01/30 01:36:26 UTC

Drill Case Sensitivity

What is Drill's policy on case sensitivity? Some of the tests assume that Drill is case-insensitive, but how does Drill handle data sources like HBase that are case sensitive? Do we do some validation that there are no potentially conflicting column names like "employee" and "Employee"? Do we run and hope for the best? Or do we do something more advanced to handle these cases?

Thanks,
Tim

Re: Drill Case Sensitivity

Posted by Abhishek Girish <ag...@apache.org>.
Drill is case preserving. In most cases, table names (and path) are
case-sensitive (except probably on Windows). The case sensitivity /
in-sensitivity of columns depends on the data source. For example, with DFS
formats such as parquet and JSON, column names are case insensitive. But
with MapR-DB (and probably HBase) they are case sensitive. However, we do
not do any kind of validation.

So in the case of DFS, "employee" and "Employee" column names across
records are considered the same and filtering on any case variant of the
column name "employee" would return all matching results. But in the case
of MapR-DB, we'll end up with only rows which match the exact case of the
column. This is a bit concerning, as there is a potential for wrong
results, if the user is not careful with the case. Even more concerning
when it goes unnoticed as there is no validation of any kind (I ran into
this when I accidentally formatted my queries to upper case).

On Mon, Jan 29, 2018 at 5:36 PM, Timothy Farkas <tf...@mapr.com> wrote:

> What is Drill's policy on case sensitivity? Some of the tests assume that
> Drill is case-insensitive, but how does Drill handle data sources like
> HBase that are case sensitive? Do we do some validation that there are no
> potentially conflicting column names like "employee" and "Employee"? Do we
> run and hope for the best? Or do we do something more advanced to handle
> these cases?
>
> Thanks,
> Tim
>