You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Spoutable <jo...@spoutable.com> on 2015/12/19 00:55:28 UTC
using multiple levels of dir variables for querying files on s3 fails
I am querying json files on s3 using the s3a storage plugin on drill 1.3
The following query works fine
select count(i0) from fh.`2015/12/05` where dir0 = '23’;
This next query doesnt
select count(i0) from fh.`2015/12/` where dir0 = '05' and dir1 = '23’;
My understanding is that the two queries were synonymous.
Ultimately I am working towards the following scenario.
select count(i0)
from fh.`2015/12`
where ((dir0 = '05' and dir1 = '23') or (dir0 = '06' and dir1 = '00’))
My files are organized in such a way that the time of the data in the files isnt perfectly aligned with the time / location of the files themselves. For reference, Im using aws firehose in case anybody else is using that too.
Has anybody run into the first problem or second problem and come up with a good solution to querying subsets of files in adjacent directories?
Cheers,
Josh Schlesser
Re: using multiple levels of dir variables for querying files on s3 fails
Posted by Spoutable <jo...@spoutable.com>.
Here is the verbose error message the query. If I query fh.`2015/12/05/22` directly it works fine. Thanks for the help.
select count(i0) from fh.`2015/12` where dir0 in ('05') and dir1 in ('22','23')
VALIDATION ERROR: From line 1, column 27 to line 1, column 28: Table 'fh.2015/12' not found
(org.apache.calcite.tools.ValidationException) org.apache.calcite.runtime.CalciteContextException: From line 1, column 27 to line 1, column 28: Table 'fh.2015/12' not found
org.apache.calcite.prepare.PlannerImpl.validate():179
org.apache.calcite.prepare.PlannerImpl.validateAndGetType():188
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():447
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():190
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():159
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():184
org.apache.drill.exec.work.foreman.Foreman.runSQL():905
org.apache.drill.exec.work.foreman.Foreman.run():244
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745
Caused By (org.apache.calcite.runtime.CalciteContextException) From line 1, column 27 to line 1, column 28: Table 'fh.2015/12' not found
sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():57
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():526
org.apache.calcite.runtime.Resources$ExInstWithCause.ex():405
org.apache.calcite.sql.SqlUtil.newContextException():685
org.apache.calcite.sql.SqlUtil.newContextException():673
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():3930
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():106
org.apache.calcite.sql.validate.AbstractNamespace.validate():86
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2777
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2762
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():2985
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():86
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
org.apache.calcite.sql.SqlSelect.validate():210
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():837
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
org.apache.calcite.prepare.PlannerImpl.validate():177
org.apache.calcite.prepare.PlannerImpl.validateAndGetType():188
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():447
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():190
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():159
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():184
org.apache.drill.exec.work.foreman.Foreman.runSQL():905
org.apache.drill.exec.work.foreman.Foreman.run():244
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745
Caused By (org.apache.calcite.sql.validate.SqlValidatorException) Table 'fh.2015/12' not found
sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():57
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():526
org.apache.calcite.runtime.Resources$ExInstWithCause.ex():405
org.apache.calcite.runtime.Resources$ExInst.ex():514
org.apache.calcite.sql.SqlUtil.newContextException():685
org.apache.calcite.sql.SqlUtil.newContextException():673
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():3930
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():106
org.apache.calcite.sql.validate.AbstractNamespace.validate():86
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2777
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2762
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():2985
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():86
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
org.apache.calcite.sql.SqlSelect.validate():210
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():837
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
org.apache.calcite.prepare.PlannerImpl.validate():177
org.apache.calcite.prepare.PlannerImpl.validateAndGetType():188
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():447
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():190
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():159
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():184
org.apache.drill.exec.work.foreman.Foreman.runSQL():905
org.apache.drill.exec.work.foreman.Foreman.run():244
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745
> On Dec 21, 2015, at 10:33 AM, Nathan Griffith <ng...@dremio.com> wrote:
>
> Hi Josh,
>
> When you say the second query doesn't work, what is the exact
> behavior? Does it return zero rows?
>
> --Nathan
>
> On Fri, Dec 18, 2015 at 3:55 PM, Spoutable <jo...@spoutable.com> wrote:
>> I am querying json files on s3 using the s3a storage plugin on drill 1.3
>>
>> The following query works fine
>> select count(i0) from fh.`2015/12/05` where dir0 = '23’;
>>
>> This next query doesnt
>> select count(i0) from fh.`2015/12/` where dir0 = '05' and dir1 = '23’;
>>
>> My understanding is that the two queries were synonymous.
>>
>> Ultimately I am working towards the following scenario.
>> select count(i0)
>> from fh.`2015/12`
>> where ((dir0 = '05' and dir1 = '23') or (dir0 = '06' and dir1 = '00’))
>>
>> My files are organized in such a way that the time of the data in the files isnt perfectly aligned with the time / location of the files themselves. For reference, Im using aws firehose in case anybody else is using that too.
>>
>> Has anybody run into the first problem or second problem and come up with a good solution to querying subsets of files in adjacent directories?
>>
>> Cheers,
>> Josh Schlesser
Re: using multiple levels of dir variables for querying files on s3 fails
Posted by Nathan Griffith <ng...@dremio.com>.
Hi Josh,
When you say the second query doesn't work, what is the exact
behavior? Does it return zero rows?
--Nathan
On Fri, Dec 18, 2015 at 3:55 PM, Spoutable <jo...@spoutable.com> wrote:
> I am querying json files on s3 using the s3a storage plugin on drill 1.3
>
> The following query works fine
> select count(i0) from fh.`2015/12/05` where dir0 = '23’;
>
> This next query doesnt
> select count(i0) from fh.`2015/12/` where dir0 = '05' and dir1 = '23’;
>
> My understanding is that the two queries were synonymous.
>
> Ultimately I am working towards the following scenario.
> select count(i0)
> from fh.`2015/12`
> where ((dir0 = '05' and dir1 = '23') or (dir0 = '06' and dir1 = '00’))
>
> My files are organized in such a way that the time of the data in the files isnt perfectly aligned with the time / location of the files themselves. For reference, Im using aws firehose in case anybody else is using that too.
>
> Has anybody run into the first problem or second problem and come up with a good solution to querying subsets of files in adjacent directories?
>
> Cheers,
> Josh Schlesser