You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Shivram Mani (JIRA)" <ji...@apache.org> on 2016/09/26 23:37:21 UTC

[jira] [Updated] (HAWQ-1075) Restore default behavior of client side(PXF) checksum validation when reading blocks from HDFS

     [ https://issues.apache.org/jira/browse/HAWQ-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shivram Mani updated HAWQ-1075:
-------------------------------
    Summary: Restore default behavior of client side(PXF) checksum validation when reading blocks from HDFS  (was: Make checksum verification configurable in PXF HdfsTextSimple profile)

> Restore default behavior of client side(PXF) checksum validation when reading blocks from HDFS
> ----------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1075
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1075
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Shivram Mani
>
> Currently HdfsTextSimple profile which is the optimized profile to read Text/CSV uses ChunkRecordReader to read chunks of records (as opposed to individual records). Here dfs.client.read.shortcircuit.skip.checksum is explicitly set to true to avoid incurring any delays with checksum check while opening/reading the file/block. 
> This configuration needs to be exposed as an option and by default client side checksum check must occur in order to be resilient to any data corruption issues which aren't caught internally by the datanode block reporting mechanism (even fsck doesn't catch certain block corruption issues).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)