You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Chuck Hardin <ch...@namimedia.com> on 2013/09/12 19:00:41 UTC

UNION ALL query behaving strangely; WHERE CLAUSE is also not being honored

Please bear with me, because this is a pretty large query.

TL;DR:  I'm doing a UNION ALL on a bunch of subqueries.  The subqueries return no results, but the overall query does.  Sadly, the constraints on the subqueries are not being honored in the UNION ALL.

I'm executing the following:

SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
       SUM(entries.clicks), SUM(entries.conversions),
       SUM(entries.cost), SUM(entries.feed_calls),
       SUM(entries.impressions), SUM(entries.revenue),
       SUM(entries.used_listings)
FROM
  (
    SELECT
      used_listing.advertiserid AS advertiser_id,
      -1 AS keyword,
      used_listing.adgroupid AS adgroup_id,
      used_listing.adid AS ad_id,
      used_listing.feedid AS feed_id,
      0 AS clicks,
      used_listing.campaignid AS campaign_id,
      concat(used_listing.publisherid, "_", used_listing.sourceid, "_", used_listing.subid) AS subid,
      0 AS conversions,
      0 AS feed_calls,
      0 AS impressions,
      2013 AS lax_year,
      used_listing.publisherid AS publisher_id,
      0 AS cost,
      used_listing.subid AS sub_id,
      used_listing.sourceid AS source_id,
      used_listing.timezone AS timezone,
      09 AS lax_month,
      0 AS revenue,
      used_listing.networkid AS network_id,
      05 AS lax_day,
      1 AS used_listings,
      used_listing.ul_datetime AS start_timestamp,
      -1 AS bid_id FROM used_listing
WHERE used_listing.ul_datetime >= 1378364400
AND   used_listing.ul_datetime < 1378450800
AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

    UNION ALL
    SELECT
      click.click.advertiser_id AS advertiser_id,
      click.click.keyword AS keyword,
      click.click.adgroup_id AS adgroup_id,
      click.click.ad_id AS ad_id,
      click.feed_id AS feed_id,
      1 AS clicks,
      click.click.campaign_id AS campaign_id,
      click.click.subid AS subid,
      0 AS conversions,
      0 AS feed_calls,
      0 AS impressions,
      2013 AS lax_year,
      click.click.publisher_id AS publisher_id,
      click.click.publisher_cost AS cost,
      click.click.sub_id AS sub_id,
      click.click.source_id AS source_id,
      click.click.timezone AS timezone,
      09 AS lax_month,
      click.click.click_cost AS revenue,
      click.click.network_id AS network_id,
      05 AS lax_day,
      0 AS used_listings,
      click.start_timestamp AS start_timestamp,
      click.click.bid_id AS bid_id FROM click
WHERE click.start_timestamp >= 1378364400
AND   click.start_timestamp < 1378450800
AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

    UNION ALL
    SELECT
      result.ad.advertiser_id AS advertiser_id,
      result.bid.text AS keyword,
      result.ad.adgroup_id AS adgroup_id,
      result.ad.ad_id AS ad_id,
      result.ad.feed_id AS feed_id,
      0 AS clicks,
      result.ad.campaign_id AS campaign_id,
      concat(result.ad.publisher_id, "_", result.ad.source_id, "_", result.ad.sub_id) AS subid,
      0 AS conversions,
      0 AS feed_calls,
      1 AS impressions,
      2013 AS lax_year,
      result.ad.publisher_id AS publisher_id,
      0 AS cost,
      result.ad.sub_id AS sub_id,
      result.ad.source_id AS source_id,
      result.ad.timezone AS timezone,
      09 AS lax_month,
      0 AS revenue,
      result.ad.network_id AS network_id,
      05 AS lax_day,
      0 AS used_listings,
      result.ad.start_timestamp AS start_timestamp,
      result.bid.bid_id AS bid_id FROM impression
WHERE result.ad.start_timestamp >= 1378364400
AND   result.ad.start_timestamp < 1378450800
AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

    UNION ALL
    SELECT
      conversion.conv_data.advertiser_id AS advertiser_id,
      conversion.conv_data.keyword AS keyword,
      conversion.conv_data.adgroup_id AS adgroup_id,
      conversion.conv_data.ad_id AS ad_id,
      conversion.conv_data.feed_id AS feed_id,
      0 AS clicks,
      conversion.conv_data.campaign_id AS campaign_id,
      conversion.conv_data.subid AS subid,
      1 AS conversions,
      0 AS feed_calls,
      0 AS impressions,
      2013 AS lax_year,
      conversion.conv_data.publisher_id AS publisher_id,
      0 AS cost,
      conversion.conv_data.sub_id AS sub_id,
      conversion.conv_data.source_id AS source_id,
      conversion.conv_data.timezone AS timezone,
      09 AS lax_month,
      0 AS revenue,
      conversion.conv_data.network_id AS network_id,
      05 AS lax_day,
      0 AS used_listings,
      conversion.start_timestamp AS start_timestamp,
      -1 AS bid_id FROM conversion
WHERE conversion.start_timestamp >= 1378364400
AND   conversion.start_timestamp < 1378450800
AND conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

)
entries
GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id

Got all that?  Good!

The first strange thing is that the overall query returns results, yet I get no results for any of the subqueries on which I'm performing a UNION ALL.

The second strange thing is that the overall query does not honor the constraint on network_id at the very least, and probably isn't honoring the constraints on start_timestamp either (I haven't checked).

I tried putting the constraint on the output of the UNION ALL query:

WHERE entries.start_timestamp >= 1378364400
AND   entries.start_timestamp < 1378450800
AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

That didn't work either.  Same behavior.

Can anyone help me make this work?

Re: UNION ALL query behaving strangely; WHERE CLAUSE is also not being honored

Posted by Yin Huai <hu...@gmail.com>.
Hi,

Can you also attach the query plan (the result of EXPLAIN)? It may help to
find where is the problem.

Thanks,

Yin


On Thu, Sep 12, 2013 at 1:00 PM, Chuck Hardin <ch...@namimedia.com> wrote:

> Please bear with me, because this is a pretty large query.
>
> TL;DR:  I'm doing a UNION ALL on a bunch of subqueries.  The subqueries
> return no results, but the overall query does.  Sadly, the constraints on
> the subqueries are not being honored in the UNION ALL.
>
> I'm executing the following:
>
> SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
>        SUM(entries.clicks), SUM(entries.conversions),
>        SUM(entries.cost), SUM(entries.feed_calls),
>        SUM(entries.impressions), SUM(entries.revenue),
>        SUM(entries.used_listings)
> FROM
>   (
>     SELECT
>       used_listing.advertiserid AS advertiser_id,
>       -1 AS keyword,
>       used_listing.adgroupid AS adgroup_id,
>       used_listing.adid AS ad_id,
>       used_listing.feedid AS feed_id,
>       0 AS clicks,
>       used_listing.campaignid AS campaign_id,
>       concat(used_listing.publisherid, "_", used_listing.sourceid, "_",
> used_listing.subid) AS subid,
>       0 AS conversions,
>       0 AS feed_calls,
>       0 AS impressions,
>       2013 AS lax_year,
>       used_listing.publisherid AS publisher_id,
>       0 AS cost,
>       used_listing.subid AS sub_id,
>       used_listing.sourceid AS source_id,
>       used_listing.timezone AS timezone,
>       09 AS lax_month,
>       0 AS revenue,
>       used_listing.networkid AS network_id,
>       05 AS lax_day,
>       1 AS used_listings,
>       used_listing.ul_datetime AS start_timestamp,
>       -1 AS bid_id FROM used_listing
> WHERE used_listing.ul_datetime >= 1378364400
> AND   used_listing.ul_datetime < 1378450800
> AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
>     UNION ALL
>     SELECT
>       click.click.advertiser_id AS advertiser_id,
>       click.click.keyword AS keyword,
>       click.click.adgroup_id AS adgroup_id,
>       click.click.ad_id AS ad_id,
>       click.feed_id AS feed_id,
>       1 AS clicks,
>       click.click.campaign_id AS campaign_id,
>       click.click.subid AS subid,
>       0 AS conversions,
>       0 AS feed_calls,
>       0 AS impressions,
>       2013 AS lax_year,
>       click.click.publisher_id AS publisher_id,
>       click.click.publisher_cost AS cost,
>       click.click.sub_id AS sub_id,
>       click.click.source_id AS source_id,
>       click.click.timezone AS timezone,
>       09 AS lax_month,
>       click.click.click_cost AS revenue,
>       click.click.network_id AS network_id,
>       05 AS lax_day,
>       0 AS used_listings,
>       click.start_timestamp AS start_timestamp,
>       click.click.bid_id AS bid_id FROM click
> WHERE click.start_timestamp >= 1378364400
> AND   click.start_timestamp < 1378450800
> AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
>     UNION ALL
>     SELECT
>       result.ad.advertiser_id AS advertiser_id,
>       result.bid.text AS keyword,
>       result.ad.adgroup_id AS adgroup_id,
>       result.ad.ad_id AS ad_id,
>       result.ad.feed_id AS feed_id,
>       0 AS clicks,
>       result.ad.campaign_id AS campaign_id,
>       concat(result.ad.publisher_id, "_", result.ad.source_id, "_",
> result.ad.sub_id) AS subid,
>       0 AS conversions,
>       0 AS feed_calls,
>       1 AS impressions,
>       2013 AS lax_year,
>       result.ad.publisher_id AS publisher_id,
>       0 AS cost,
>       result.ad.sub_id AS sub_id,
>       result.ad.source_id AS source_id,
>       result.ad.timezone AS timezone,
>       09 AS lax_month,
>       0 AS revenue,
>       result.ad.network_id AS network_id,
>       05 AS lax_day,
>       0 AS used_listings,
>       result.ad.start_timestamp AS start_timestamp,
>       result.bid.bid_id AS bid_id FROM impression
> WHERE result.ad.start_timestamp >= 1378364400
> AND   result.ad.start_timestamp < 1378450800
> AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
>     UNION ALL
>     SELECT
>       conversion.conv_data.advertiser_id AS advertiser_id,
>       conversion.conv_data.keyword AS keyword,
>       conversion.conv_data.adgroup_id AS adgroup_id,
>       conversion.conv_data.ad_id AS ad_id,
>       conversion.conv_data.feed_id AS feed_id,
>       0 AS clicks,
>       conversion.conv_data.campaign_id AS campaign_id,
>       conversion.conv_data.subid AS subid,
>       1 AS conversions,
>       0 AS feed_calls,
>       0 AS impressions,
>       2013 AS lax_year,
>       conversion.conv_data.publisher_id AS publisher_id,
>       0 AS cost,
>       conversion.conv_data.sub_id AS sub_id,
>       conversion.conv_data.source_id AS source_id,
>       conversion.conv_data.timezone AS timezone,
>       09 AS lax_month,
>       0 AS revenue,
>       conversion.conv_data.network_id AS network_id,
>       05 AS lax_day,
>       0 AS used_listings,
>       conversion.start_timestamp AS start_timestamp,
>       -1 AS bid_id FROM conversion
> WHERE conversion.start_timestamp >= 1378364400
> AND   conversion.start_timestamp < 1378450800
> AND
> conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> )
> entries
> GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id
>
> Got all that?  Good!
>
> The first strange thing is that the overall query returns results, yet I
> get no results for any of the subqueries on which I'm performing a UNION
> ALL.
>
> The second strange thing is that the overall query does not honor the
> constraint on network_id at the very least, and probably isn't honoring the
> constraints on start_timestamp either (I haven't checked).
>
> I tried putting the constraint on the output of the UNION ALL query:
>
> WHERE entries.start_timestamp >= 1378364400
> AND   entries.start_timestamp < 1378450800
> AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> That didn't work either.  Same behavior.
>
> Can anyone help me make this work?

Re: UNION ALL query behaving strangely; WHERE CLAUSE is also not being honored

Posted by Chuck Hardin <ch...@namimedia.com>.
It would probably have been an idea to mention that in this case, impression is an EXPLODEd view into a field in another table.

And that queries into this view produce odd errors.

And that EXPLODE in Hive seems generally questionable.

I'm dealing in another way.

On Sep 12, 2013, at 10:00 AM, Chuck Hardin <ch...@namimedia.com> wrote:

> Please bear with me, because this is a pretty large query.
> 
> TL;DR:  I'm doing a UNION ALL on a bunch of subqueries.  The subqueries return no results, but the overall query does.  Sadly, the constraints on the subqueries are not being honored in the UNION ALL.
> 
> I'm executing the following:
> 
> SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
>       SUM(entries.clicks), SUM(entries.conversions),
>       SUM(entries.cost), SUM(entries.feed_calls),
>       SUM(entries.impressions), SUM(entries.revenue),
>       SUM(entries.used_listings)
> FROM
>  (
>    SELECT
>      used_listing.advertiserid AS advertiser_id,
>      -1 AS keyword,
>      used_listing.adgroupid AS adgroup_id,
>      used_listing.adid AS ad_id,
>      used_listing.feedid AS feed_id,
>      0 AS clicks,
>      used_listing.campaignid AS campaign_id,
>      concat(used_listing.publisherid, "_", used_listing.sourceid, "_", used_listing.subid) AS subid,
>      0 AS conversions,
>      0 AS feed_calls,
>      0 AS impressions,
>      2013 AS lax_year,
>      used_listing.publisherid AS publisher_id,
>      0 AS cost,
>      used_listing.subid AS sub_id,
>      used_listing.sourceid AS source_id,
>      used_listing.timezone AS timezone,
>      09 AS lax_month,
>      0 AS revenue,
>      used_listing.networkid AS network_id,
>      05 AS lax_day,
>      1 AS used_listings,
>      used_listing.ul_datetime AS start_timestamp,
>      -1 AS bid_id FROM used_listing
> WHERE used_listing.ul_datetime >= 1378364400
> AND   used_listing.ul_datetime < 1378450800
> AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
>    UNION ALL
>    SELECT
>      click.click.advertiser_id AS advertiser_id,
>      click.click.keyword AS keyword,
>      click.click.adgroup_id AS adgroup_id,
>      click.click.ad_id AS ad_id,
>      click.feed_id AS feed_id,
>      1 AS clicks,
>      click.click.campaign_id AS campaign_id,
>      click.click.subid AS subid,
>      0 AS conversions,
>      0 AS feed_calls,
>      0 AS impressions,
>      2013 AS lax_year,
>      click.click.publisher_id AS publisher_id,
>      click.click.publisher_cost AS cost,
>      click.click.sub_id AS sub_id,
>      click.click.source_id AS source_id,
>      click.click.timezone AS timezone,
>      09 AS lax_month,
>      click.click.click_cost AS revenue,
>      click.click.network_id AS network_id,
>      05 AS lax_day,
>      0 AS used_listings,
>      click.start_timestamp AS start_timestamp,
>      click.click.bid_id AS bid_id FROM click
> WHERE click.start_timestamp >= 1378364400
> AND   click.start_timestamp < 1378450800
> AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
>    UNION ALL
>    SELECT
>      result.ad.advertiser_id AS advertiser_id,
>      result.bid.text AS keyword,
>      result.ad.adgroup_id AS adgroup_id,
>      result.ad.ad_id AS ad_id,
>      result.ad.feed_id AS feed_id,
>      0 AS clicks,
>      result.ad.campaign_id AS campaign_id,
>      concat(result.ad.publisher_id, "_", result.ad.source_id, "_", result.ad.sub_id) AS subid,
>      0 AS conversions,
>      0 AS feed_calls,
>      1 AS impressions,
>      2013 AS lax_year,
>      result.ad.publisher_id AS publisher_id,
>      0 AS cost,
>      result.ad.sub_id AS sub_id,
>      result.ad.source_id AS source_id,
>      result.ad.timezone AS timezone,
>      09 AS lax_month,
>      0 AS revenue,
>      result.ad.network_id AS network_id,
>      05 AS lax_day,
>      0 AS used_listings,
>      result.ad.start_timestamp AS start_timestamp,
>      result.bid.bid_id AS bid_id FROM impression
> WHERE result.ad.start_timestamp >= 1378364400
> AND   result.ad.start_timestamp < 1378450800
> AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
>    UNION ALL
>    SELECT
>      conversion.conv_data.advertiser_id AS advertiser_id,
>      conversion.conv_data.keyword AS keyword,
>      conversion.conv_data.adgroup_id AS adgroup_id,
>      conversion.conv_data.ad_id AS ad_id,
>      conversion.conv_data.feed_id AS feed_id,
>      0 AS clicks,
>      conversion.conv_data.campaign_id AS campaign_id,
>      conversion.conv_data.subid AS subid,
>      1 AS conversions,
>      0 AS feed_calls,
>      0 AS impressions,
>      2013 AS lax_year,
>      conversion.conv_data.publisher_id AS publisher_id,
>      0 AS cost,
>      conversion.conv_data.sub_id AS sub_id,
>      conversion.conv_data.source_id AS source_id,
>      conversion.conv_data.timezone AS timezone,
>      09 AS lax_month,
>      0 AS revenue,
>      conversion.conv_data.network_id AS network_id,
>      05 AS lax_day,
>      0 AS used_listings,
>      conversion.start_timestamp AS start_timestamp,
>      -1 AS bid_id FROM conversion
> WHERE conversion.start_timestamp >= 1378364400
> AND   conversion.start_timestamp < 1378450800
> AND conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
> )
> entries
> GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id
> 
> Got all that?  Good!
> 
> The first strange thing is that the overall query returns results, yet I get no results for any of the subqueries on which I'm performing a UNION ALL.
> 
> The second strange thing is that the overall query does not honor the constraint on network_id at the very least, and probably isn't honoring the constraints on start_timestamp either (I haven't checked).
> 
> I tried putting the constraint on the output of the UNION ALL query:
> 
> WHERE entries.start_timestamp >= 1378364400
> AND   entries.start_timestamp < 1378450800
> AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
> 
> That didn't work either.  Same behavior.
> 
> Can anyone help me make this work?