You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2014/06/25 20:20:28 UTC

[jira] [Created] (DRILL-1072) Drill is very slow when we have a large number of text files

Rahul Challapalli created DRILL-1072:
----------------------------------------

             Summary: Drill is very slow when we have a large number of text files
                 Key: DRILL-1072
                 URL: https://issues.apache.org/jira/browse/DRILL-1072
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization, Storage - Parquet, Storage - Text & CSV
            Reporter: Rahul Challapalli


git.commit.id.abbrev=efa3274
Build# 26178

As the total number of files under the below directory increase, drill becomes very slow. Check the results for different file counts for the below query.

All files just contain 1 number and have a '.tbl' extension

select count(*) from dfs.`/drill/testdata/morefiles`;

100 files --- 5.183 seconds
250 files --- 15.021 seconds
500 files --- 26.846 seconds
1000 files --- 69.835 seconds
5000 files --- 1573.589 seconds

The logs contain these messages repeatedly when executing against 5000 files:

22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5



--
This message was sent by Atlassian JIRA
(v6.2#6252)