You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Shivram Mani (JIRA)" <ji...@apache.org> on 2016/09/30 22:31:20 UTC

[jira] [Resolved] (HAWQ-1075) Restore default behavior of client side(PXF) checksum validation when reading blocks from HDFS

     [ https://issues.apache.org/jira/browse/HAWQ-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shivram Mani resolved HAWQ-1075.
--------------------------------
    Resolution: Fixed

> Restore default behavior of client side(PXF) checksum validation when reading blocks from HDFS
> ----------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1075
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1075
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Shivram Mani
>             Fix For: 2.0.1.0-incubating
>
>
> Currently HdfsTextSimple profile which is the optimized PXF profile to read Text/CSV uses ChunkRecordReader to read chunks of records (as opposed to individual records). Here dfs.client.read.shortcircuit.skip.checksum is explicitly set to true to avoid incurring any delays with checksum check while opening/reading the file/block. 
> Background Information:
> PXF uses a 2 stage process to access HDFS data. 
> Stage 1, it fetches all the target blocks for the given file (along with replica information). 
> Stage 2 (after HAWQ prepares an optimized access plan based on locality), PXF agents reads the blocks in parallel.
> In almost all scenarios hadoop internally catches block corruption issues and such blocks are never returned to any client requesting for block locations (Stage 1). In certain scenarios such as a block corruption without change in size, Stage1 can still return the location of the corrupted block as well, and hence Stage 2 will need to perform an additional checksum check.
> With client side checksum check on read (default behavior), we are resilient to such checksum errors on read as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)