You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by James Yu <cy...@gmail.com> on 2018/03/22 01:34:04 UTC

how does SQL mode work with PopularPlaces example?

Hi,

I am following the PopularPlacesSQL example (
http://training.data-artisans.com/exercises/popularPlacesSql.html), but I
am unable to understand why the following statement will pickup events with
START flag only.

"SELECT " +
"toCoords(cell), wstart, wend, isStart, popCnt " +
"FROM " +
"(SELECT " +
"cell, " +
"isStart, " +
"HOP_START(eventTime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS wstart,
" +
"HOP_END(eventTime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS wend, " +
"COUNT(isStart) AS popCnt " +
"FROM " +
"(SELECT " +
"eventTime, " +
"isStart, " +
"CASE WHEN isStart THEN toCellId(startLon, startLat) ELSE toCellId(endLon,
endLat) END AS cell " +
"FROM TaxiRides " +
"WHERE isInNYC(startLon, startLat) AND isInNYC(endLon, endLat)) " +
"GROUP BY cell, isStart, HOP(eventTime, INTERVAL '5' MINUTE, INTERVAL '15'
MINUTE)) " +
"WHERE popCnt > 20"

Since we can update state in processElement when we do it with low level
ProcessFunction, how does SQL rule out the un-paired events?


This is a UTF-8 formatted mail
-----------------------------------------------
James C.-C.Yu
+886988713275

Re: how does SQL mode work with PopularPlaces example?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi James,

the exercise does not require to filter on pickup events. It says:

"This is done by counting every five minutes the number of taxi rides that
started and ended in the same area within the last 15 minutes. Arrival and
departure locations should be separately counted."

That is achieved by including isStart in the GROUP BY fields.

If you would only like to count pickup locations, you can just add a
predicate "WHERE isStart".
Regarding the question of state cleanup, the query does not require to join
start and end events. Both are separately counted.
To join start and end events, you can use a time-windowed join [1] (since
Flink 1.4) which will automatically remove rows from the state that fell
out of the join window.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/sql.html#joins


2018-03-22 2:34 GMT+01:00 James Yu <cy...@gmail.com>:

> Hi,
>
> I am following the PopularPlacesSQL example (http://training.data-
> artisans.com/exercises/popularPlacesSql.html), but I am unable to
> understand why the following statement will pickup events with START flag
> only.
>
> "SELECT " +
> "toCoords(cell), wstart, wend, isStart, popCnt " +
> "FROM " +
> "(SELECT " +
> "cell, " +
> "isStart, " +
> "HOP_START(eventTime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS
> wstart, " +
> "HOP_END(eventTime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS wend, " +
> "COUNT(isStart) AS popCnt " +
> "FROM " +
> "(SELECT " +
> "eventTime, " +
> "isStart, " +
> "CASE WHEN isStart THEN toCellId(startLon, startLat) ELSE toCellId(endLon,
> endLat) END AS cell " +
> "FROM TaxiRides " +
> "WHERE isInNYC(startLon, startLat) AND isInNYC(endLon, endLat)) " +
> "GROUP BY cell, isStart, HOP(eventTime, INTERVAL '5' MINUTE, INTERVAL '15'
> MINUTE)) " +
> "WHERE popCnt > 20"
>
> Since we can update state in processElement when we do it with low level
> ProcessFunction, how does SQL rule out the un-paired events?
>
>
> This is a UTF-8 formatted mail
> -----------------------------------------------
> James C.-C.Yu
> +886988713275 <+886%20988%20713%20275>
>