You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Pooja Nilangekar (JIRA)" <ji...@apache.org> on 2018/08/20 19:13:00 UTC

[jira] [Commented] (IMPALA-6932) Simple LIMIT 1 query can be really slow on many-filed sequence datasets

    [ https://issues.apache.org/jira/browse/IMPALA-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586383#comment-16586383 ] 

Pooja Nilangekar commented on IMPALA-6932:
------------------------------------------

This issue is not specific to the Avro scanner. It appears for all scanners which inherit from the BaseSequenceScanners (Avro, RC and Sequence).

> Simple LIMIT 1 query can be really slow on many-filed sequence datasets
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-6932
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6932
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>            Reporter: Philip Zeyliger
>            Assignee: Pooja Nilangekar
>            Priority: Critical
>
> I recently ran across really slow behavior with the trivial {{SELECT * FROM table LIMIT 1}} query. The table used Avro as a file format and had about 45,000 files across about 250 partitions. An optimization kicked in to set NUM_NODES to 1.
> The query ran for about an hour, and the profile indicated that it was opening files:
>           - TotalRawHdfsOpenFileTime(*): 1.0h (3622833666032)
> I took a single minidump while this query was running, and I suspect the query was here:
> {code:java}
> 1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) [scanner-context.cc : 115 + 0x13]
> 2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned char**, bool, long*) [scanner-context.cc : 241 + 0x5]
> 3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h : 54 + 0x1f]
> 4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) [base-sequence-scanner.cc : 157 + 0x13]
> 5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
> 6 impalad!impala::HdfsScanNode::ProcessSplit(std::vector<impala::FilterContext, std::allocator<impala::FilterContext> > const&, impala::MemPool*, impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
> 7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 0x1c]
> 8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*) [function_template.hpp : 767 + 0x7]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org