You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/08/06 20:17:00 UTC

[jira] [Resolved] (IMPALA-10005) Impala can't read Snappy compressed text files on S3 or ABFS

     [ https://issues.apache.org/jira/browse/IMPALA-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe McDonnell resolved IMPALA-10005.
------------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> Impala can't read Snappy compressed text files on S3 or ABFS
> ------------------------------------------------------------
>
>                 Key: IMPALA-10005
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10005
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>             Fix For: Impala 4.0
>
>
> When reading snappy compressed text from S3 or ABFS on a release build, it fails to decompress:
>  
> {noformat}
> I0723 21:19:43.712909 229706 status.cc:128] Snappy: RawUncompress failed
>     @           0xae26c9  impala::Status::Status()
>     @          0x107635b  impala::SnappyDecompressor::ProcessBlock()
>     @          0x11b1f2d  impala::HdfsTextScanner::FillByteBufferCompressedFile()
>     @          0x11b23ef  impala::HdfsTextScanner::FillByteBuffer()
>     @          0x11af96f  impala::HdfsTextScanner::FillByteBufferWrapper()
>     @          0x11b096b  impala::HdfsTextScanner::ProcessRange()
>     @          0x11b2b31  impala::HdfsTextScanner::GetNextInternal()
>     @          0x118644b  impala::HdfsScanner::ProcessSplit()
>     @          0x11774c2  impala::HdfsScanNode::ProcessSplit()
>     @          0x1178805  impala::HdfsScanNode::ScannerThread()
>     @          0x1100f31  impala::Thread::SuperviseThread()
>     @          0x1101a79  boost::detail::thread_data<>::run()
>     @          0x16a3449  thread_proxy
>     @     0x7fc522befe24  start_thread
>     @     0x7fc522919bac  __clone{noformat}
> When using a debug build, Impala hits the following DCHECK:
>  
>  
> {noformat}
> F0723 23:45:12.849973 249653 hdfs-text-scanner.cc:197] Check failed: stream_>file_desc()>file_compression != THdfsCompression::SNAPPY FE should have generated SNAPPY_BLOCKED instead.{noformat}
> That DCHECK explains why it would fail to decompress. It is using the wrong THdfsCompression.
> I reproduced this on master in my dev env by changing FileSystemUtil::supportsStorageIds() to always return true. This emulates the behavior on object stores like S3 and ABFS.
>  
> {noformat}
>   /**
>    * Returns true if the filesystem supports storage UUIDs in BlockLocation calls.
>    */
>   public static boolean supportsStorageIds(FileSystem fs) {
>     return false;
>   }{noformat}
> This is specific to Snappy and does not appear to apply to other compression codecs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org