You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/12/13 23:16:58 UTC
[jira] [Assigned] (HIVE-15422)
HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge
number of objects for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan reassigned HIVE-15422:
---------------------------------------
Assignee: Rajesh Balamohan
> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Priority: Minor
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select 'depart' as type, origin as city, count(origin) as frequency
> from flights
> group by origin
> order by frequency desc, type) as a
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large amount of objects created just in path comparisons in HiveInputFormat. HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)