You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Chuck Hardin <ch...@namimedia.com> on 2013/09/12 19:00:41 UTC
UNION ALL query behaving strangely; WHERE CLAUSE is also not being
honored
Please bear with me, because this is a pretty large query.
TL;DR: I'm doing a UNION ALL on a bunch of subqueries. The subqueries return no results, but the overall query does. Sadly, the constraints on the subqueries are not being honored in the UNION ALL.
I'm executing the following:
SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
SUM(entries.clicks), SUM(entries.conversions),
SUM(entries.cost), SUM(entries.feed_calls),
SUM(entries.impressions), SUM(entries.revenue),
SUM(entries.used_listings)
FROM
(
SELECT
used_listing.advertiserid AS advertiser_id,
-1 AS keyword,
used_listing.adgroupid AS adgroup_id,
used_listing.adid AS ad_id,
used_listing.feedid AS feed_id,
0 AS clicks,
used_listing.campaignid AS campaign_id,
concat(used_listing.publisherid, "_", used_listing.sourceid, "_", used_listing.subid) AS subid,
0 AS conversions,
0 AS feed_calls,
0 AS impressions,
2013 AS lax_year,
used_listing.publisherid AS publisher_id,
0 AS cost,
used_listing.subid AS sub_id,
used_listing.sourceid AS source_id,
used_listing.timezone AS timezone,
09 AS lax_month,
0 AS revenue,
used_listing.networkid AS network_id,
05 AS lax_day,
1 AS used_listings,
used_listing.ul_datetime AS start_timestamp,
-1 AS bid_id FROM used_listing
WHERE used_listing.ul_datetime >= 1378364400
AND used_listing.ul_datetime < 1378450800
AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
UNION ALL
SELECT
click.click.advertiser_id AS advertiser_id,
click.click.keyword AS keyword,
click.click.adgroup_id AS adgroup_id,
click.click.ad_id AS ad_id,
click.feed_id AS feed_id,
1 AS clicks,
click.click.campaign_id AS campaign_id,
click.click.subid AS subid,
0 AS conversions,
0 AS feed_calls,
0 AS impressions,
2013 AS lax_year,
click.click.publisher_id AS publisher_id,
click.click.publisher_cost AS cost,
click.click.sub_id AS sub_id,
click.click.source_id AS source_id,
click.click.timezone AS timezone,
09 AS lax_month,
click.click.click_cost AS revenue,
click.click.network_id AS network_id,
05 AS lax_day,
0 AS used_listings,
click.start_timestamp AS start_timestamp,
click.click.bid_id AS bid_id FROM click
WHERE click.start_timestamp >= 1378364400
AND click.start_timestamp < 1378450800
AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
UNION ALL
SELECT
result.ad.advertiser_id AS advertiser_id,
result.bid.text AS keyword,
result.ad.adgroup_id AS adgroup_id,
result.ad.ad_id AS ad_id,
result.ad.feed_id AS feed_id,
0 AS clicks,
result.ad.campaign_id AS campaign_id,
concat(result.ad.publisher_id, "_", result.ad.source_id, "_", result.ad.sub_id) AS subid,
0 AS conversions,
0 AS feed_calls,
1 AS impressions,
2013 AS lax_year,
result.ad.publisher_id AS publisher_id,
0 AS cost,
result.ad.sub_id AS sub_id,
result.ad.source_id AS source_id,
result.ad.timezone AS timezone,
09 AS lax_month,
0 AS revenue,
result.ad.network_id AS network_id,
05 AS lax_day,
0 AS used_listings,
result.ad.start_timestamp AS start_timestamp,
result.bid.bid_id AS bid_id FROM impression
WHERE result.ad.start_timestamp >= 1378364400
AND result.ad.start_timestamp < 1378450800
AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
UNION ALL
SELECT
conversion.conv_data.advertiser_id AS advertiser_id,
conversion.conv_data.keyword AS keyword,
conversion.conv_data.adgroup_id AS adgroup_id,
conversion.conv_data.ad_id AS ad_id,
conversion.conv_data.feed_id AS feed_id,
0 AS clicks,
conversion.conv_data.campaign_id AS campaign_id,
conversion.conv_data.subid AS subid,
1 AS conversions,
0 AS feed_calls,
0 AS impressions,
2013 AS lax_year,
conversion.conv_data.publisher_id AS publisher_id,
0 AS cost,
conversion.conv_data.sub_id AS sub_id,
conversion.conv_data.source_id AS source_id,
conversion.conv_data.timezone AS timezone,
09 AS lax_month,
0 AS revenue,
conversion.conv_data.network_id AS network_id,
05 AS lax_day,
0 AS used_listings,
conversion.start_timestamp AS start_timestamp,
-1 AS bid_id FROM conversion
WHERE conversion.start_timestamp >= 1378364400
AND conversion.start_timestamp < 1378450800
AND conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
)
entries
GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id
Got all that? Good!
The first strange thing is that the overall query returns results, yet I get no results for any of the subqueries on which I'm performing a UNION ALL.
The second strange thing is that the overall query does not honor the constraint on network_id at the very least, and probably isn't honoring the constraints on start_timestamp either (I haven't checked).
I tried putting the constraint on the output of the UNION ALL query:
WHERE entries.start_timestamp >= 1378364400
AND entries.start_timestamp < 1378450800
AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
That didn't work either. Same behavior.
Can anyone help me make this work?
Re: UNION ALL query behaving strangely; WHERE CLAUSE is also not
being honored
Posted by Yin Huai <hu...@gmail.com>.
Hi,
Can you also attach the query plan (the result of EXPLAIN)? It may help to
find where is the problem.
Thanks,
Yin
On Thu, Sep 12, 2013 at 1:00 PM, Chuck Hardin <ch...@namimedia.com> wrote:
> Please bear with me, because this is a pretty large query.
>
> TL;DR: I'm doing a UNION ALL on a bunch of subqueries. The subqueries
> return no results, but the overall query does. Sadly, the constraints on
> the subqueries are not being honored in the UNION ALL.
>
> I'm executing the following:
>
> SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
> SUM(entries.clicks), SUM(entries.conversions),
> SUM(entries.cost), SUM(entries.feed_calls),
> SUM(entries.impressions), SUM(entries.revenue),
> SUM(entries.used_listings)
> FROM
> (
> SELECT
> used_listing.advertiserid AS advertiser_id,
> -1 AS keyword,
> used_listing.adgroupid AS adgroup_id,
> used_listing.adid AS ad_id,
> used_listing.feedid AS feed_id,
> 0 AS clicks,
> used_listing.campaignid AS campaign_id,
> concat(used_listing.publisherid, "_", used_listing.sourceid, "_",
> used_listing.subid) AS subid,
> 0 AS conversions,
> 0 AS feed_calls,
> 0 AS impressions,
> 2013 AS lax_year,
> used_listing.publisherid AS publisher_id,
> 0 AS cost,
> used_listing.subid AS sub_id,
> used_listing.sourceid AS source_id,
> used_listing.timezone AS timezone,
> 09 AS lax_month,
> 0 AS revenue,
> used_listing.networkid AS network_id,
> 05 AS lax_day,
> 1 AS used_listings,
> used_listing.ul_datetime AS start_timestamp,
> -1 AS bid_id FROM used_listing
> WHERE used_listing.ul_datetime >= 1378364400
> AND used_listing.ul_datetime < 1378450800
> AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> UNION ALL
> SELECT
> click.click.advertiser_id AS advertiser_id,
> click.click.keyword AS keyword,
> click.click.adgroup_id AS adgroup_id,
> click.click.ad_id AS ad_id,
> click.feed_id AS feed_id,
> 1 AS clicks,
> click.click.campaign_id AS campaign_id,
> click.click.subid AS subid,
> 0 AS conversions,
> 0 AS feed_calls,
> 0 AS impressions,
> 2013 AS lax_year,
> click.click.publisher_id AS publisher_id,
> click.click.publisher_cost AS cost,
> click.click.sub_id AS sub_id,
> click.click.source_id AS source_id,
> click.click.timezone AS timezone,
> 09 AS lax_month,
> click.click.click_cost AS revenue,
> click.click.network_id AS network_id,
> 05 AS lax_day,
> 0 AS used_listings,
> click.start_timestamp AS start_timestamp,
> click.click.bid_id AS bid_id FROM click
> WHERE click.start_timestamp >= 1378364400
> AND click.start_timestamp < 1378450800
> AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> UNION ALL
> SELECT
> result.ad.advertiser_id AS advertiser_id,
> result.bid.text AS keyword,
> result.ad.adgroup_id AS adgroup_id,
> result.ad.ad_id AS ad_id,
> result.ad.feed_id AS feed_id,
> 0 AS clicks,
> result.ad.campaign_id AS campaign_id,
> concat(result.ad.publisher_id, "_", result.ad.source_id, "_",
> result.ad.sub_id) AS subid,
> 0 AS conversions,
> 0 AS feed_calls,
> 1 AS impressions,
> 2013 AS lax_year,
> result.ad.publisher_id AS publisher_id,
> 0 AS cost,
> result.ad.sub_id AS sub_id,
> result.ad.source_id AS source_id,
> result.ad.timezone AS timezone,
> 09 AS lax_month,
> 0 AS revenue,
> result.ad.network_id AS network_id,
> 05 AS lax_day,
> 0 AS used_listings,
> result.ad.start_timestamp AS start_timestamp,
> result.bid.bid_id AS bid_id FROM impression
> WHERE result.ad.start_timestamp >= 1378364400
> AND result.ad.start_timestamp < 1378450800
> AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> UNION ALL
> SELECT
> conversion.conv_data.advertiser_id AS advertiser_id,
> conversion.conv_data.keyword AS keyword,
> conversion.conv_data.adgroup_id AS adgroup_id,
> conversion.conv_data.ad_id AS ad_id,
> conversion.conv_data.feed_id AS feed_id,
> 0 AS clicks,
> conversion.conv_data.campaign_id AS campaign_id,
> conversion.conv_data.subid AS subid,
> 1 AS conversions,
> 0 AS feed_calls,
> 0 AS impressions,
> 2013 AS lax_year,
> conversion.conv_data.publisher_id AS publisher_id,
> 0 AS cost,
> conversion.conv_data.sub_id AS sub_id,
> conversion.conv_data.source_id AS source_id,
> conversion.conv_data.timezone AS timezone,
> 09 AS lax_month,
> 0 AS revenue,
> conversion.conv_data.network_id AS network_id,
> 05 AS lax_day,
> 0 AS used_listings,
> conversion.start_timestamp AS start_timestamp,
> -1 AS bid_id FROM conversion
> WHERE conversion.start_timestamp >= 1378364400
> AND conversion.start_timestamp < 1378450800
> AND
> conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> )
> entries
> GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id
>
> Got all that? Good!
>
> The first strange thing is that the overall query returns results, yet I
> get no results for any of the subqueries on which I'm performing a UNION
> ALL.
>
> The second strange thing is that the overall query does not honor the
> constraint on network_id at the very least, and probably isn't honoring the
> constraints on start_timestamp either (I haven't checked).
>
> I tried putting the constraint on the output of the UNION ALL query:
>
> WHERE entries.start_timestamp >= 1378364400
> AND entries.start_timestamp < 1378450800
> AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> That didn't work either. Same behavior.
>
> Can anyone help me make this work?
Re: UNION ALL query behaving strangely; WHERE CLAUSE is also not
being honored
Posted by Chuck Hardin <ch...@namimedia.com>.
It would probably have been an idea to mention that in this case, impression is an EXPLODEd view into a field in another table.
And that queries into this view produce odd errors.
And that EXPLODE in Hive seems generally questionable.
I'm dealing in another way.
On Sep 12, 2013, at 10:00 AM, Chuck Hardin <ch...@namimedia.com> wrote:
> Please bear with me, because this is a pretty large query.
>
> TL;DR: I'm doing a UNION ALL on a bunch of subqueries. The subqueries return no results, but the overall query does. Sadly, the constraints on the subqueries are not being honored in the UNION ALL.
>
> I'm executing the following:
>
> SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
> SUM(entries.clicks), SUM(entries.conversions),
> SUM(entries.cost), SUM(entries.feed_calls),
> SUM(entries.impressions), SUM(entries.revenue),
> SUM(entries.used_listings)
> FROM
> (
> SELECT
> used_listing.advertiserid AS advertiser_id,
> -1 AS keyword,
> used_listing.adgroupid AS adgroup_id,
> used_listing.adid AS ad_id,
> used_listing.feedid AS feed_id,
> 0 AS clicks,
> used_listing.campaignid AS campaign_id,
> concat(used_listing.publisherid, "_", used_listing.sourceid, "_", used_listing.subid) AS subid,
> 0 AS conversions,
> 0 AS feed_calls,
> 0 AS impressions,
> 2013 AS lax_year,
> used_listing.publisherid AS publisher_id,
> 0 AS cost,
> used_listing.subid AS sub_id,
> used_listing.sourceid AS source_id,
> used_listing.timezone AS timezone,
> 09 AS lax_month,
> 0 AS revenue,
> used_listing.networkid AS network_id,
> 05 AS lax_day,
> 1 AS used_listings,
> used_listing.ul_datetime AS start_timestamp,
> -1 AS bid_id FROM used_listing
> WHERE used_listing.ul_datetime >= 1378364400
> AND used_listing.ul_datetime < 1378450800
> AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> UNION ALL
> SELECT
> click.click.advertiser_id AS advertiser_id,
> click.click.keyword AS keyword,
> click.click.adgroup_id AS adgroup_id,
> click.click.ad_id AS ad_id,
> click.feed_id AS feed_id,
> 1 AS clicks,
> click.click.campaign_id AS campaign_id,
> click.click.subid AS subid,
> 0 AS conversions,
> 0 AS feed_calls,
> 0 AS impressions,
> 2013 AS lax_year,
> click.click.publisher_id AS publisher_id,
> click.click.publisher_cost AS cost,
> click.click.sub_id AS sub_id,
> click.click.source_id AS source_id,
> click.click.timezone AS timezone,
> 09 AS lax_month,
> click.click.click_cost AS revenue,
> click.click.network_id AS network_id,
> 05 AS lax_day,
> 0 AS used_listings,
> click.start_timestamp AS start_timestamp,
> click.click.bid_id AS bid_id FROM click
> WHERE click.start_timestamp >= 1378364400
> AND click.start_timestamp < 1378450800
> AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> UNION ALL
> SELECT
> result.ad.advertiser_id AS advertiser_id,
> result.bid.text AS keyword,
> result.ad.adgroup_id AS adgroup_id,
> result.ad.ad_id AS ad_id,
> result.ad.feed_id AS feed_id,
> 0 AS clicks,
> result.ad.campaign_id AS campaign_id,
> concat(result.ad.publisher_id, "_", result.ad.source_id, "_", result.ad.sub_id) AS subid,
> 0 AS conversions,
> 0 AS feed_calls,
> 1 AS impressions,
> 2013 AS lax_year,
> result.ad.publisher_id AS publisher_id,
> 0 AS cost,
> result.ad.sub_id AS sub_id,
> result.ad.source_id AS source_id,
> result.ad.timezone AS timezone,
> 09 AS lax_month,
> 0 AS revenue,
> result.ad.network_id AS network_id,
> 05 AS lax_day,
> 0 AS used_listings,
> result.ad.start_timestamp AS start_timestamp,
> result.bid.bid_id AS bid_id FROM impression
> WHERE result.ad.start_timestamp >= 1378364400
> AND result.ad.start_timestamp < 1378450800
> AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> UNION ALL
> SELECT
> conversion.conv_data.advertiser_id AS advertiser_id,
> conversion.conv_data.keyword AS keyword,
> conversion.conv_data.adgroup_id AS adgroup_id,
> conversion.conv_data.ad_id AS ad_id,
> conversion.conv_data.feed_id AS feed_id,
> 0 AS clicks,
> conversion.conv_data.campaign_id AS campaign_id,
> conversion.conv_data.subid AS subid,
> 1 AS conversions,
> 0 AS feed_calls,
> 0 AS impressions,
> 2013 AS lax_year,
> conversion.conv_data.publisher_id AS publisher_id,
> 0 AS cost,
> conversion.conv_data.sub_id AS sub_id,
> conversion.conv_data.source_id AS source_id,
> conversion.conv_data.timezone AS timezone,
> 09 AS lax_month,
> 0 AS revenue,
> conversion.conv_data.network_id AS network_id,
> 05 AS lax_day,
> 0 AS used_listings,
> conversion.start_timestamp AS start_timestamp,
> -1 AS bid_id FROM conversion
> WHERE conversion.start_timestamp >= 1378364400
> AND conversion.start_timestamp < 1378450800
> AND conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> )
> entries
> GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id
>
> Got all that? Good!
>
> The first strange thing is that the overall query returns results, yet I get no results for any of the subqueries on which I'm performing a UNION ALL.
>
> The second strange thing is that the overall query does not honor the constraint on network_id at the very least, and probably isn't honoring the constraints on start_timestamp either (I haven't checked).
>
> I tried putting the constraint on the output of the UNION ALL query:
>
> WHERE entries.start_timestamp >= 1378364400
> AND entries.start_timestamp < 1378450800
> AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> That didn't work either. Same behavior.
>
> Can anyone help me make this work?