You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Arina Ielchiieva (JIRA)" <ji...@apache.org> on 2018/10/29 10:46:01 UTC

[jira] [Comment Edited] (DRILL-6814) Query performance on S3 files

    [ https://issues.apache.org/jira/browse/DRILL-6814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666930#comment-16666930 ] 

Arina Ielchiieva edited comment on DRILL-6814 at 10/29/18 10:45 AM:
--------------------------------------------------------------------

[~ashishkshukladb] you are querying the same files from S3 and HDFS with the same Drill cluster layout, you can compare query profiles and see where the bottleneck is, how many major fragments are created, i.e. if Drill parallels read operation on S3 and HDFS. 

Also what type of storage do you use (https://aws.amazon.com/s3/storage-classes/)?


was (Author: arina):
[~ashishkshukladb] you are querying the same files from S3 and HDFS with the same Drill cluster layout, you can compare query profiles and see where the bottleneck is, how many major fragments are created, i.e. if Drill parallels read operation on S3 and HDFS. 

> Query performance on S3 files
> -----------------------------
>
>                 Key: DRILL-6814
>                 URL: https://issues.apache.org/jira/browse/DRILL-6814
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Other
>    Affects Versions: 1.14.0
>         Environment: Amazon EC2 instances-
> 4 Linux Redhat machines -version 7.5
> RAM- 32GB
>            Reporter: Ashish Shukla
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.15.0
>
>
> I have installed 4 Node drill cluster on Amazon EC2 and  trying to execute a simple count on one Amazon S3 file. File type is CSV and size is approx- 14GB.
>  The query returns expected count after the execution of approx 30 minutes.
>  If we keep the same file in hdfs or create a table in postgres, execution time is relatively very less (approx 2-3 minutes).
>  Is it normal behavior or something can be done for S3 files to make execution time comparable ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)