You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/04/16 10:33:59 UTC

[GitHub] [incubator-druid] bharathpatnaik edited a comment on issue #7488: Unable to do a Semi Join

bharathpatnaik edited a comment on issue #7488: Unable to do a Semi Join
URL: https://github.com/apache/incubator-druid/issues/7488#issuecomment-483605408
 
 
   Druid 0.14
   
   Very large cluster (Don't know the exact size but can handle billions of telemetry rows with ease)
   
   I get a cannot build a plan error for a query which looks like this:
   
   **Query1:**
   `
   _SELECT a from table1 
   WHERE a IN 
   (SELECT a from table1 WHERE ...)
   
   Exact Queries used in the process were:
   
   SELECT "dimensions_did" FROM
     (
       SELECT "dimensions_did",
         COUNT(DISTINCT month_of_year) AS months_present_in_year,
         SUM(time_spent_in_month) AS time_spent_in_year FROM
         (
           SELECT
             "dimensions_did",
             month_of_year,
             COUNT(DISTINCT week_of_year) AS weeks_present_in_month,
             SUM(time_spent_in_week) AS time_spent_in_month FROM
             (
               SELECT
                 "dimensions_did", month_of_year, week_of_year,
                 COUNT(DISTINCT day_of_year) AS days_present_in_week,
                 SUM(time_spent_in_day) AS time_spent_in_week FROM
                 (
                   SELECT "dimensions_did",
                     TIME_EXTRACT(MILLIS_TO_TIMESTAMP("syncts"), 'MONTH') AS month_of_year,
                     TIME_EXTRACT(MILLIS_TO_TIMESTAMP("syncts"), 'WEEK') AS week_of_year,
                     TIME_EXTRACT(MILLIS_TO_TIMESTAMP("syncts"), 'DOY') AS day_of_year,
                     SUM("total_time_spent") AS time_spent_in_day FROM
                     
                     "druid"."summary-events"
                     
                   WHERE "syncts" >= 1546300800000  AND "syncts" <= 1553990400000
                     AND "total_time_spent" > 0 AND "content_object_type" = 'Content' AND "dimensions_mode" = 'play'
                   GROUP BY 1, 2, 3, 4
                 )
               GROUP BY 1, 2, 3
             )
           WHERE days_present_in_week >= 2
           GROUP BY 1, 2
         )
       WHERE
         weeks_present_in_month >= 2
       GROUP BY 1
     )
   WHERE months_present_in_year >= 2  AND time_spent_in_year > 1000
    LIMIT 100_
   `
   **Query 2**
   `
   _SELECT "dimensions_did"
   FROM "druid"."summary-events"
   WHERE "syncts" >= 1546300800000 AND "syncts" <= 1553990400000 AND "total_time_spent" > 0 AND "content_object_type" = 'Content' AND "dimensions_mode" = 'play'
   GROUP BY 1 HAVING COUNT(DISTINCT TIME_FORMAT(MILLIS_TO_TIMESTAMP("syncts"), 'yyyy/MM/dd')) > 36
   _
   `
   
   **Actual Query Used:**
   `
   _SELECT "dimensions_did" FROM
     (
       SELECT "dimensions_did",
         COUNT(DISTINCT month_of_year) AS months_present_in_year,
         SUM(time_spent_in_month) AS time_spent_in_year FROM
         (
           SELECT
             "dimensions_did",
             month_of_year,
             COUNT(DISTINCT week_of_year) AS weeks_present_in_month,
             SUM(time_spent_in_week) AS time_spent_in_month FROM
             (
               SELECT
                 "dimensions_did", month_of_year, week_of_year,
                 COUNT(DISTINCT day_of_year) AS days_present_in_week,
                 SUM(time_spent_in_day) AS time_spent_in_week FROM
                 (
                   SELECT "dimensions_did",
                     TIME_EXTRACT(MILLIS_TO_TIMESTAMP("syncts"), 'MONTH') AS month_of_year,
                     TIME_EXTRACT(MILLIS_TO_TIMESTAMP("syncts"), 'WEEK') AS week_of_year,
                     TIME_EXTRACT(MILLIS_TO_TIMESTAMP("syncts"), 'DOY') AS day_of_year,
                     SUM("total_time_spent") AS time_spent_in_day FROM
                     
                     "druid"."summary-events"
                     
                   WHERE "syncts" >= 1546300800000  AND "syncts" <= 1553990400000
                     AND "total_time_spent" > 0 AND "content_object_type" = 'Content' AND "dimensions_mode" = 'play'
                   GROUP BY 1, 2, 3, 4
                 )
               GROUP BY 1, 2, 3
             )
           WHERE days_present_in_week >= 2
           GROUP BY 1, 2
         )
       WHERE
         weeks_present_in_month >= 2
       GROUP BY 1
     )
   WHERE months_present_in_year >= 2  AND time_spent_in_year > 1000
   AND "dimensions_did" IN 
   (SELECT "dimensions_did"
   FROM "druid"."summary-events"
   WHERE "syncts" >= 1546300800000 AND "syncts" <= 1553990400000 AND "total_time_spent" > 0 AND "content_object_type" = 'Content' AND "dimensions_mode" = 'play'
   GROUP BY 1 HAVING COUNT(DISTINCT TIME_FORMAT(MILLIS_TO_TIMESTAMP("syncts"), 'yyyy/MM/dd')) > 36
   )
    LIMIT 100_
   `
   Thanks!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org