You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/02/17 03:28:42 UTC
[jira] [Created] (DRILL-5271) EasyFormatPlugin creates readers for
all input files at start - memory waste
Paul Rogers created DRILL-5271:
----------------------------------
Summary: EasyFormatPlugin creates readers for all input files at start - memory waste
Key: DRILL-5271
URL: https://issues.apache.org/jira/browse/DRILL-5271
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10
Reporter: Paul Rogers
Priority: Minor
The {{EasyFormatPlugin}} creates record readers for a scan operation. The scan operation lists the set of files to scan. The {{EasyFormatPlugin}} iterates over this list and creates a {{RecordReader}} for each.
{code}
public abstract RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork,
List<SchemaPath> columns, String userName) throws ExecutionSetupException;
...
for(FileWork work : scan.getWorkUnits()){
RecordReader recordReader = getRecordReader(context, dfs, work, scan.getColumns(), scan.getUserName());
readers.add(recordReader);
{code}
Consider a test with a single thread and 5000 files. The above behavior ends up creating 5000 {{RecordReader}} objects at query start. This holds onto resources that could be better used elsewhere.
Suggest creating the RecordReaders as needed, discarding the old before starting the next.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)