You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2015/02/24 09:16:12 UTC
[jira] [Commented] (HIVE-9495) Map Side aggregation affecting map
performance
[ https://issues.apache.org/jira/browse/HIVE-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334576#comment-14334576 ]
Hive QA commented on HIVE-9495:
-------------------------------
{color:red}Overall{color}: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12700340/HIVE-9495.2.patch.txt
{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7566 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_lateral_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_json_tuple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_parse_url_tuple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view
{noformat}
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2854/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2854/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2854/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12700340 - PreCommit-HIVE-TRUNK-Build
> Map Side aggregation affecting map performance
> ----------------------------------------------
>
> Key: HIVE-9495
> URL: https://issues.apache.org/jira/browse/HIVE-9495
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.14.0
> Environment: RHEL 6.4
> Hortonworks Hadoop 2.2
> Reporter: Anand Sridharan
> Attachments: HIVE-9495.1.patch.txt, HIVE-9495.2.patch.txt, profiler_screenshot.PNG
>
>
> When trying to run a simple aggregation query with hive.map.aggr=true, map tasks take a lot of time in Hive 0.14 as against with hive.map.aggr=false.
> e.g.
> Consider the query:
> {code}
> INSERT OVERWRITE TABLE lineitem_tgt_agg
> select alias.a0 as a0,
> alias.a2 as a1,
> alias.a1 as a2,
> alias.a3 as a3,
> alias.a4 as a4
> from (
> select alias.a0 as a0,
> SUM(alias.a1) as a1,
> SUM(alias.a2) as a2,
> SUM(alias.a3) as a3,
> SUM(alias.a4) as a4
> from (
> select lineitem_sf500.l_orderkey as a0,
> CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * (1 - lineitem_sf500.l_discount) * (1 + lineitem_sf500.l_tax) as double) as a1,
> lineitem_sf500.l_quantity as a2,
> CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_discount as double) as a3,
> CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_tax as double) as a4
> from lineitem_sf500
> ) alias
> group by alias.a0
> ) alias;
> {code}
> The above query was run with ~376GB of data / ~3billion records in the source.
> It takes ~10 minutes with hive.map.aggr=false.
> With map side aggregation set to true, the map tasks don't complete even after an hour.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)