You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Vaibhav Gumashta (JIRA)" <ji...@apache.org> on 2019/03/15 22:19:00 UTC
[jira] [Created] (HIVE-21458) ACID: Optimize
AcidUtils$MetaDataFile.isRawFormat check by caching the split reader
Vaibhav Gumashta created HIVE-21458:
---------------------------------------
Summary: ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader
Key: HIVE-21458
URL: https://issues.apache.org/jira/browse/HIVE-21458
Project: Hive
Issue Type: Bug
Components: Transactions
Affects Versions: 3.1.1
Reporter: Vaibhav Gumashta
In the transactional subsystems, in several places we check to see if a data file has ROW__ID fields or not. Every time we do that (even within the context of the same query), we open a Reader for that file/split. We could optimize this by caching. Also, perhaps we don't need to do this for every split. An example call stack:
{code}
OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105
AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026
AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022
AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007
OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, Configuration) line: 1231
OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, Configuration, OrcRawRecordMerger$Options) line: 722
OrcRawRecordMerger.<init>(Configuration, boolean, Reader, boolean, int, ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1022
OrcInputFormat.getReader(InputSplit, Options) line: 2108
OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006
FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776
FetchOperator.getRecordReader() line: 344
FetchOperator.getNextRow() line: 540
FetchOperator.pushRow() line: 509
FetchTask.fetch(List) line: 146
{code}
Here, for each split we'll make that check.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)