You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2017/08/23 22:39:00 UTC

[jira] [Closed] (IMPALA-3837) Filtering at scan node using bloom filters is 2x slower than filtering at join

     [ https://issues.apache.org/jira/browse/IMPALA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mostafa Mokhtar closed IMPALA-3837.
-----------------------------------
    Resolution: Invalid

> Filtering at scan node using bloom filters is 2x slower than filtering at join
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-3837
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3837
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.7.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Mostafa Mokhtar
>            Priority: Minor
>              Labels: performance
>
> Results are from my dev box, lineitem_large has 288 million rows  just to make the test run longer. 
> On a cluster which doesn't support AVX2 the slowdown was 50%. 
> With RUNTIME_FILTER_MODE=0 query below finished in 6.87s
> With RUNTIME_FILTER_MODE=1 query below finished in 12.19s
> The majority of the 75% increase in the cycles & instructions is coming from the call stack below
> {code}
> Clockticks
> 1 of 12: 78.4% (10516569839 of 13417219743)
> impalad ! _mm256_testc_si256 - avxintrin.h
> impalad ! impala::BloomFilter::BucketFindAVX2 + 0xf - bloom-filter.h:244
> impalad ! impala::BloomFilter::Find + 0x224 - bloom-filter.h:185
> impalad ! Eval<void> - runtime-filter.inline.h:53
> impalad ! impala::HdfsParquetScanner::EvalRuntimeFilters + 0x48 - stl_vector.h:780
> impalad ! impala::HdfsParquetScanner::TransferScratchTuples + 0x1c8 - hdfs-parquet-scanner.cc:2015
> impalad ! impala::HdfsParquetScanner::AssembleRows + 0x244 - hdfs-parquet-scanner.cc:2133
> impalad ! impala::HdfsParquetScanner::ProcessSplit + 0x518 - hdfs-parquet-scanner.cc:1939
> impalad ! impala::HdfsScanNode::ProcessSplit + 0x395 - hdfs-scan-node.cc:1227
> impalad ! impala::HdfsScanNode::ScannerThread + 0xcad - hdfs-scan-node.cc:1086
> impalad ! boost::function0<void>::operator() + 0x1a - function_template.hpp:767
> impalad ! impala::Thread::SuperviseThread + 0x20c - thread.cc:315
> impalad ! operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0> + 0x5a - bind.hpp:457
> impalad ! boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void (void)>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void (void)>>, boost::_bi::value<impala::Promise<long>*>>>::operator() - bind_template.hpp:20
> impalad ! boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void (void)>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void (void)>>, boost::_bi::value<impala::Promise<long>*>>>>::run + 0x19 - thread.hpp:116
> impalad ! thread_proxy + 0xd9 - [unknown source file]
> libpthread-2.19.so ! start_thread + 0xc1 - pthread_create.c:312
> libc-2.19.so ! __clone + 0x6c - clone.S:111
> {code}
> The remaining 25% of CPU should be addressed by IMPALA-3360
> Query
> {code}
> set num_scanner_threads=1;
> select 
>     count(*)
> from
>     lineitem_large,
>     supplier
> where
>     s_suppkey = l_suppkey
>         and s_name = 'Supplier#000000001';
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)