You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/12/13 07:12:58 UTC

[jira] [Created] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset

Rajesh Balamohan created HIVE-15422:
---------------------------------------

             Summary: HiveInputFormat::pushProjectionsAndFilters path comparisons create huge number of objects for partitioned dataset
                 Key: HIVE-15422
                 URL: https://issues.apache.org/jira/browse/HIVE-15422
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan
            Priority: Minor


When executing the following query in LLAP (single instance) in a 5 node cluster, lots of GC pressure was observed.

{noformat}
select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
from (select  'depart' as type, origin as city, count(origin) as frequency
from flights
  group by origin
  order by frequency desc, type) as a 
left join airports as b on a.city = b.iata
order by frequency desc;
{noformat}

Flights table has got around 7000+ partitions in S3. Profiling revealed large amount of objects created just in path comparisons in HiveInputFormat.  HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)