You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Spoutable <jo...@spoutable.com> on 2015/12/19 00:55:28 UTC

using multiple levels of dir variables for querying files on s3 fails

I am querying json files on s3 using the s3a storage plugin on drill 1.3

The following query works fine
select  count(i0) from    fh.`2015/12/05` where dir0 = '23’;

This next query doesnt
select  count(i0) from    fh.`2015/12/` where dir0 = '05' and dir1 = '23’;

My understanding is that the two queries were synonymous.

Ultimately I am working towards the following scenario.
select  count(i0)
from    fh.`2015/12`
where   ((dir0 = '05' and dir1 = '23') or (dir0 = '06' and dir1 = '00’))

My files are organized in such a way that the time of the data in the files isnt perfectly aligned with the time / location of the files themselves.   For reference, Im using aws firehose in case anybody else is using that too.

Has anybody run into the first problem or second problem and come up with a good solution to querying subsets of files in adjacent directories?

Cheers,
Josh Schlesser

Re: using multiple levels of dir variables for querying files on s3 fails

Posted by Spoutable <jo...@spoutable.com>.
Here is the verbose error message the query.   If I query fh.`2015/12/05/22` directly it works fine.   Thanks for the help.

select  count(i0) from    fh.`2015/12` where dir0 in ('05') and dir1 in ('22','23')


            VALIDATION ERROR: From line 1, column 27 to line 1, column 28: Table 'fh.2015/12' not found



  (org.apache.calcite.tools.ValidationException) org.apache.calcite.runtime.CalciteContextException: From line 1, column 27 to line 1, column 28: Table 'fh.2015/12' not found
    org.apache.calcite.prepare.PlannerImpl.validate():179
    org.apache.calcite.prepare.PlannerImpl.validateAndGetType():188
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():447
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():190
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():159
    org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():184
    org.apache.drill.exec.work.foreman.Foreman.runSQL():905
    org.apache.drill.exec.work.foreman.Foreman.run():244
    java.util.concurrent.ThreadPoolExecutor.runWorker():1145
    java.util.concurrent.ThreadPoolExecutor$Worker.run():615
    java.lang.Thread.run():745
  Caused By (org.apache.calcite.runtime.CalciteContextException) From line 1, column 27 to line 1, column 28: Table 'fh.2015/12' not found
    sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
    sun.reflect.NativeConstructorAccessorImpl.newInstance():57
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
    java.lang.reflect.Constructor.newInstance():526
    org.apache.calcite.runtime.Resources$ExInstWithCause.ex():405
    org.apache.calcite.sql.SqlUtil.newContextException():685
    org.apache.calcite.sql.SqlUtil.newContextException():673
    org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():3930
    org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():106
    org.apache.calcite.sql.validate.AbstractNamespace.validate():86
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2777
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2762
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():2985
    org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
    org.apache.calcite.sql.validate.AbstractNamespace.validate():86
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
    org.apache.calcite.sql.SqlSelect.validate():210
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():837
    org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
    org.apache.calcite.prepare.PlannerImpl.validate():177
    org.apache.calcite.prepare.PlannerImpl.validateAndGetType():188
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():447
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():190
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():159
    org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():184
    org.apache.drill.exec.work.foreman.Foreman.runSQL():905
    org.apache.drill.exec.work.foreman.Foreman.run():244
    java.util.concurrent.ThreadPoolExecutor.runWorker():1145
    java.util.concurrent.ThreadPoolExecutor$Worker.run():615
    java.lang.Thread.run():745
  Caused By (org.apache.calcite.sql.validate.SqlValidatorException) Table 'fh.2015/12' not found
    sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
    sun.reflect.NativeConstructorAccessorImpl.newInstance():57
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
    java.lang.reflect.Constructor.newInstance():526
    org.apache.calcite.runtime.Resources$ExInstWithCause.ex():405
    org.apache.calcite.runtime.Resources$ExInst.ex():514
    org.apache.calcite.sql.SqlUtil.newContextException():685
    org.apache.calcite.sql.SqlUtil.newContextException():673
    org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():3930
    org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():106
    org.apache.calcite.sql.validate.AbstractNamespace.validate():86
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2777
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2762
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():2985
    org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
    org.apache.calcite.sql.validate.AbstractNamespace.validate():86
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
    org.apache.calcite.sql.SqlSelect.validate():210
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():837
    org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
    org.apache.calcite.prepare.PlannerImpl.validate():177
    org.apache.calcite.prepare.PlannerImpl.validateAndGetType():188
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():447
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():190
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():159
    org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():184
    org.apache.drill.exec.work.foreman.Foreman.runSQL():905
    org.apache.drill.exec.work.foreman.Foreman.run():244
    java.util.concurrent.ThreadPoolExecutor.runWorker():1145
    java.util.concurrent.ThreadPoolExecutor$Worker.run():615
    java.lang.Thread.run():745

> On Dec 21, 2015, at 10:33 AM, Nathan Griffith <ng...@dremio.com> wrote:
> 
> Hi Josh,
> 
> When you say the second query doesn't work, what is the exact
> behavior? Does it return zero rows?
> 
> --Nathan
> 
> On Fri, Dec 18, 2015 at 3:55 PM, Spoutable <jo...@spoutable.com> wrote:
>> I am querying json files on s3 using the s3a storage plugin on drill 1.3
>> 
>> The following query works fine
>> select  count(i0) from    fh.`2015/12/05` where dir0 = '23’;
>> 
>> This next query doesnt
>> select  count(i0) from    fh.`2015/12/` where dir0 = '05' and dir1 = '23’;
>> 
>> My understanding is that the two queries were synonymous.
>> 
>> Ultimately I am working towards the following scenario.
>> select  count(i0)
>> from    fh.`2015/12`
>> where   ((dir0 = '05' and dir1 = '23') or (dir0 = '06' and dir1 = '00’))
>> 
>> My files are organized in such a way that the time of the data in the files isnt perfectly aligned with the time / location of the files themselves.   For reference, Im using aws firehose in case anybody else is using that too.
>> 
>> Has anybody run into the first problem or second problem and come up with a good solution to querying subsets of files in adjacent directories?
>> 
>> Cheers,
>> Josh Schlesser


Re: using multiple levels of dir variables for querying files on s3 fails

Posted by Nathan Griffith <ng...@dremio.com>.
Hi Josh,

When you say the second query doesn't work, what is the exact
behavior? Does it return zero rows?

--Nathan

On Fri, Dec 18, 2015 at 3:55 PM, Spoutable <jo...@spoutable.com> wrote:
> I am querying json files on s3 using the s3a storage plugin on drill 1.3
>
> The following query works fine
> select  count(i0) from    fh.`2015/12/05` where dir0 = '23’;
>
> This next query doesnt
> select  count(i0) from    fh.`2015/12/` where dir0 = '05' and dir1 = '23’;
>
> My understanding is that the two queries were synonymous.
>
> Ultimately I am working towards the following scenario.
> select  count(i0)
> from    fh.`2015/12`
> where   ((dir0 = '05' and dir1 = '23') or (dir0 = '06' and dir1 = '00’))
>
> My files are organized in such a way that the time of the data in the files isnt perfectly aligned with the time / location of the files themselves.   For reference, Im using aws firehose in case anybody else is using that too.
>
> Has anybody run into the first problem or second problem and come up with a good solution to querying subsets of files in adjacent directories?
>
> Cheers,
> Josh Schlesser