You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Adam Szita (JIRA)" <ji...@apache.org> on 2018/12/20 13:17:00 UTC
[jira] [Created] (PIG-5373) InterRecordReader might skip records if
certain sync markers are used
Adam Szita created PIG-5373:
-------------------------------
Summary: InterRecordReader might skip records if certain sync markers are used
Key: PIG-5373
URL: https://issues.apache.org/jira/browse/PIG-5373
Project: Pig
Issue Type: Bug
Affects Versions: 0.17.0
Reporter: Adam Szita
Assignee: Adam Szita
Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can happen that sync markers are not identified while reading the interim binary file used to hold data between jobs.
In such files sync markers are placed upon writing, which later help during reading the data. These are random generated and it seems like that in some rare combinations of markers and data preceding it, they cannot be not found. This can result in reading through all the bytes (looking for the marker) and reaching split end or EOF, and extracting no records at all.
This symptom is also observable from JobHistory stats, where if a job is affected by this issue, will have tasks that have HDFS_BYTES_READ or FILE_BYTES_READ about equal to the number bytes of the split, but at the same time having MAP_INPUT_RECORDS=0
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)