You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Makkar (JIRA)" <ji...@apache.org> on 2017/04/05 15:16:41 UTC

[jira] [Created] (DRILL-5414) Issue with Querying Directories

Paul Makkar created DRILL-5414:
----------------------------------

             Summary: Issue with Querying Directories
                 Key: DRILL-5414
                 URL: https://issues.apache.org/jira/browse/DRILL-5414
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.10.0
         Environment: Kubernetes running Debian GNU/Linux 8 containers.
openjdk version "1.8.0_111".
AWS.
Using s3 buckets
            Reporter: Paul Makkar


Hi

*Thanks for apache drill - it's pretty awesome :)

I'm hoping to exploit drill directory querying and have structured my data archive in s3 to test this. However, I've got an issue using directory querying.

My directory structure in s3 is like:
s3/devices_by_id/device_id/2016/11/12/<filename>.json.gz

From the documentation I figured the following queries were equivalent:

select count(*) from `s3`.`/deviceid/xyz/2016/11/` ;
+---------+
| EXPR$0  |
+---------+
| 286049  |
+---------+
1 row selected (10.351 seconds)

select count(*) from `s3`.`/deviceid/` where dir0='xyz' and dir1='2016' and dir2='11'; But this latter query just hangs. There is no profile in the UI. I cntrl-c and get :

+--+
|  |
+--+
+--+
No rows selected (1481.727 seconds)

If I try to run an explain plan, that also hangs.

There are a total of 13283 compressed json files in the 2016/11 s3 bucket. 

The log doesn't show much information.

If anyone can help with this please? I can provide more information as required. Hopefully this is not user error.






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)