You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Shivram Mani (JIRA)" <ji...@apache.org> on 2016/09/30 22:31:20 UTC
[jira] [Resolved] (HAWQ-1075) Restore default behavior of client
side(PXF) checksum validation when reading blocks from HDFS
[ https://issues.apache.org/jira/browse/HAWQ-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shivram Mani resolved HAWQ-1075.
--------------------------------
Resolution: Fixed
> Restore default behavior of client side(PXF) checksum validation when reading blocks from HDFS
> ----------------------------------------------------------------------------------------------
>
> Key: HAWQ-1075
> URL: https://issues.apache.org/jira/browse/HAWQ-1075
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: PXF
> Reporter: Shivram Mani
> Assignee: Shivram Mani
> Fix For: 2.0.1.0-incubating
>
>
> Currently HdfsTextSimple profile which is the optimized PXF profile to read Text/CSV uses ChunkRecordReader to read chunks of records (as opposed to individual records). Here dfs.client.read.shortcircuit.skip.checksum is explicitly set to true to avoid incurring any delays with checksum check while opening/reading the file/block.
> Background Information:
> PXF uses a 2 stage process to access HDFS data.
> Stage 1, it fetches all the target blocks for the given file (along with replica information).
> Stage 2 (after HAWQ prepares an optimized access plan based on locality), PXF agents reads the blocks in parallel.
> In almost all scenarios hadoop internally catches block corruption issues and such blocks are never returned to any client requesting for block locations (Stage 1). In certain scenarios such as a block corruption without change in size, Stage1 can still return the location of the corrupted block as well, and hence Stage 2 will need to perform an additional checksum check.
> With client side checksum check on read (default behavior), we are resilient to such checksum errors on read as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)