You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/08/06 20:17:00 UTC
[jira] [Resolved] (IMPALA-10005) Impala can't read Snappy
compressed text files on S3 or ABFS
[ https://issues.apache.org/jira/browse/IMPALA-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell resolved IMPALA-10005.
------------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> Impala can't read Snappy compressed text files on S3 or ABFS
> ------------------------------------------------------------
>
> Key: IMPALA-10005
> URL: https://issues.apache.org/jira/browse/IMPALA-10005
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Blocker
> Fix For: Impala 4.0
>
>
> When reading snappy compressed text from S3 or ABFS on a release build, it fails to decompress:
>
> {noformat}
> I0723 21:19:43.712909 229706 status.cc:128] Snappy: RawUncompress failed
> @ 0xae26c9 impala::Status::Status()
> @ 0x107635b impala::SnappyDecompressor::ProcessBlock()
> @ 0x11b1f2d impala::HdfsTextScanner::FillByteBufferCompressedFile()
> @ 0x11b23ef impala::HdfsTextScanner::FillByteBuffer()
> @ 0x11af96f impala::HdfsTextScanner::FillByteBufferWrapper()
> @ 0x11b096b impala::HdfsTextScanner::ProcessRange()
> @ 0x11b2b31 impala::HdfsTextScanner::GetNextInternal()
> @ 0x118644b impala::HdfsScanner::ProcessSplit()
> @ 0x11774c2 impala::HdfsScanNode::ProcessSplit()
> @ 0x1178805 impala::HdfsScanNode::ScannerThread()
> @ 0x1100f31 impala::Thread::SuperviseThread()
> @ 0x1101a79 boost::detail::thread_data<>::run()
> @ 0x16a3449 thread_proxy
> @ 0x7fc522befe24 start_thread
> @ 0x7fc522919bac __clone{noformat}
> When using a debug build, Impala hits the following DCHECK:
>
>
> {noformat}
> F0723 23:45:12.849973 249653 hdfs-text-scanner.cc:197] Check failed: stream_>file_desc()>file_compression != THdfsCompression::SNAPPY FE should have generated SNAPPY_BLOCKED instead.{noformat}
> That DCHECK explains why it would fail to decompress. It is using the wrong THdfsCompression.
> I reproduced this on master in my dev env by changing FileSystemUtil::supportsStorageIds() to always return true. This emulates the behavior on object stores like S3 and ABFS.
>
> {noformat}
> /**
> * Returns true if the filesystem supports storage UUIDs in BlockLocation calls.
> */
> public static boolean supportsStorageIds(FileSystem fs) {
> return false;
> }{noformat}
> This is specific to Snappy and does not appear to apply to other compression codecs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org