You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "s_penakalapati@yahoo.com" <s_...@yahoo.com> on 2020/09/28 13:13:59 UTC

Flink Batch Processing

Hi All,
Need your help in Flink Batch processing: scenario described below:
we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.thresholds can be set by the owner for each vehicle. 
Say: we have 3 vehicles, threshold is set for 2 vehicles. Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15vehicle 2, threshold 35 hours, allowedPetrolConsumption=28vehicle 3  no threshold set by owner.
All the vehicle data is stored in HBase tables. We have a scheduled Batch Job every day at 12 pm to check the status of vehicle movement and Petrol consumption against threshold and raise an alert (vehicle1 did not move for past 20 hours, vehicle 2 consumed more petrol. )
Since it is a Batch Job, I loaded all threshold data in one DataSet and HBase Data in another Dataset using HbaseInputFormat.
What I am failing to figure out is:1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has threshold of 35 hours, I need to fetch data from Hbase for different scenario. Is there any better approach to get all data using one Hbase connection.2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed only on DataStream. Please help me with a simple example. (alert can be raised if count is zero or like petrol consumption is too high)

I could not get any example for Dataset on google where an alert is raised. Kindly guide me if there is any better approach
Regards,Sunitha.

Re: Flink Batch Processing

Posted by Timo Walther <tw...@apache.org>.
Hi Sunitha,

currently, not every connector can be mixed with every API. I agree that 
it is confusing from time to time. The HBase connector is an 
InputFormat. DataSet, DataStream and Table API can work with 
InputFormats. The current Hbase input format might work best with Table 
API. If you like to use CEP API, you can use Table API 
(StreamTableEnvironment) to read from Hbase and call `toAppendStream` 
directly afterwards to further process in DataStream API. This works 
also for bounded streams thus you can do "batch" processing.

Regards,
Timo


On 29.09.20 09:56, Till Rohrmann wrote:
> Hi Sunitha,
> 
> here is some documentation about how to use the Hbase sink with Flink 
> [1, 2].
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hbase.html
> [2] 
> https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-hbase-connector.html
> 
> Cheers,
> Till
> 
> On Tue, Sep 29, 2020 at 9:16 AM s_penakalapati@yahoo.com 
> <ma...@yahoo.com> <s_penakalapati@yahoo.com 
> <ma...@yahoo.com>> wrote:
> 
>     Hi Piotrek,
> 
>     Thank you for the reply.
> 
>     Flink changes are good, However Flink is changing so much that we
>     are unable to get any good implementation examples either on Flink
>     documents or any other website.
> 
>     Using HBaseInputFormat I was able to read the data as a DataSet<>,
>     now I see that DataSet would be deprecated.
> 
>     In recent release Flink 1.11.1 I see Blink planner, but I was not
>     able to get one example on how to connect to HBase and read data. Is
>     there any link I can refer to see some implementation of reading
>     from HBase as bounded data using Blink Planner/DataStream API.
> 
>     Regards,
>     Sunitha.
> 
> 
> 
>     On Monday, September 28, 2020, 07:12:19 PM GMT+5:30, Piotr Nowojski
>     <pnowojski@apache.org <ma...@apache.org>> wrote:
> 
> 
>     Hi Sunitha,
> 
>     First and foremost, the DataSet API will be deprecated soon [1] so I
>     would suggest trying to migrate to the DataStream API. When using
>     the DataStream API it doesn't mean that you can not work with
>     bounded inputs - you can. Flink SQL (Blink planner) is in fact using
>     DataStream API to execute both streaming and batch queries. Maybe
>     this path would be easier?
> 
>     And about answering your question using the DataSet API - sorry, I
>     don't know it :( I will try to ping someone who could help here.
> 
>     Piotrek
> 
>     [1]
>     https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
> 
>     pon., 28 wrz 2020 o 15:14 s_penakalapati@yahoo.com
>     <ma...@yahoo.com> <s_penakalapati@yahoo.com
>     <ma...@yahoo.com>> napisał(a):
> 
>         Hi All,
> 
>         Need your help in Flink Batch processing: scenario described below:
> 
>         we have multiple vehicles, we get data from each vehicle at a
>         very high speed, 1 record per minute.
>         thresholds can be set by the owner for each vehicle.
> 
>         Say: we have 3 vehicles, threshold is set for 2 vehicles.
>         Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
>         vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
>         vehicle 3  no threshold set by owner.
> 
>         All the vehicle data is stored in HBase tables. We have a
>         scheduled Batch Job every day at 12 pm to check the status of
>         vehicle movement and Petrol consumption against threshold and
>         raise an alert (vehicle1 did not move for past 20 hours, vehicle
>         2 consumed more petrol. )
> 
>         Since it is a Batch Job, I loaded all threshold data in one
>         DataSet and HBase Data in another Dataset using HbaseInputFormat.
> 
>         What I am failing to figure out is:
>         1> vehicle 1 is having threshold of 20 hours where as vehicle 2
>         has threshold of 35 hours, I need to fetch data from Hbase for
>         different scenario. Is there any better approach to get all data
>         using one Hbase connection.
>         2> how to apply alert on Dataset.  CEP pattern/ Match_recognize
>         is allowed only on DataStream. Please help me with a simple
>         example. (alert can be raised if count is zero or like petrol
>         consumption is too high)
> 
> 
>         I could not get any example for Dataset on google where an alert
>         is raised. Kindly guide me if there is any better approach
> 
>         Regards,
>         Sunitha.
> 


Re: Flink Batch Processing

Posted by Till Rohrmann <tr...@apache.org>.
Hi Sunitha,

here is some documentation about how to use the Hbase sink with Flink [1,
2].

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hbase.html
[2]
https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-hbase-connector.html

Cheers,
Till

On Tue, Sep 29, 2020 at 9:16 AM s_penakalapati@yahoo.com <
s_penakalapati@yahoo.com> wrote:

> Hi Piotrek,
>
> Thank you for the reply.
>
> Flink changes are good, However Flink is changing so much that we are
> unable to get any good implementation examples either on Flink documents or
> any other website.
>
> Using HBaseInputFormat I was able to read the data as a DataSet<>, now I
> see that DataSet would be deprecated.
>
> In recent release Flink 1.11.1 I see Blink planner, but I was not able to
> get one example on how to connect to HBase and read data. Is there any link
> I can refer to see some implementation of reading from HBase as bounded
> data using Blink Planner/DataStream API.
>
> Regards,
> Sunitha.
>
>
>
> On Monday, September 28, 2020, 07:12:19 PM GMT+5:30, Piotr Nowojski <
> pnowojski@apache.org> wrote:
>
>
> Hi Sunitha,
>
> First and foremost, the DataSet API will be deprecated soon [1] so I would
> suggest trying to migrate to the DataStream API. When using the DataStream
> API it doesn't mean that you can not work with bounded inputs - you can.
> Flink SQL (Blink planner) is in fact using DataStream API to execute both
> streaming and batch queries. Maybe this path would be easier?
>
> And about answering your question using the DataSet API - sorry, I don't
> know it :( I will try to ping someone who could help here.
>
> Piotrek
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
>
> pon., 28 wrz 2020 o 15:14 s_penakalapati@yahoo.com <
> s_penakalapati@yahoo.com> napisał(a):
>
> Hi All,
>
> Need your help in Flink Batch processing: scenario described below:
>
> we have multiple vehicles, we get data from each vehicle at a very high
> speed, 1 record per minute.
> thresholds can be set by the owner for each vehicle.
>
> Say: we have 3 vehicles, threshold is set for 2 vehicles.
> Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
> vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
> vehicle 3  no threshold set by owner.
>
> All the vehicle data is stored in HBase tables. We have a scheduled Batch
> Job every day at 12 pm to check the status of vehicle movement and Petrol
> consumption against threshold and raise an alert (vehicle1 did not move for
> past 20 hours, vehicle 2 consumed more petrol. )
>
> Since it is a Batch Job, I loaded all threshold data in one DataSet and
> HBase Data in another Dataset using HbaseInputFormat.
>
> What I am failing to figure out is:
> 1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has
> threshold of 35 hours, I need to fetch data from Hbase for different
> scenario. Is there any better approach to get all data using one Hbase
> connection.
> 2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed
> only on DataStream. Please help me with a simple example. (alert can be
> raised if count is zero or like petrol consumption is too high)
>
>
> I could not get any example for Dataset on google where an alert is
> raised. Kindly guide me if there is any better approach
>
> Regards,
> Sunitha.
>
>

Re: Flink Batch Processing

Posted by "s_penakalapati@yahoo.com" <s_...@yahoo.com>.
 Hi Piotrek,
Thank you for the reply.
Flink changes are good, However Flink is changing so much that we are unable to get any good implementation examples either on Flink documents or any other website.
Using HBaseInputFormat I was able to read the data as a DataSet<>, now I see that DataSet would be deprecated.
In recent release Flink 1.11.1 I see Blink planner, but I was not able to get one example on how to connect to HBase and read data. Is there any link I can refer to see some implementation of reading from HBase as bounded data using Blink Planner/DataStream API.
Regards,Sunitha.


    On Monday, September 28, 2020, 07:12:19 PM GMT+5:30, Piotr Nowojski <pn...@apache.org> wrote:  
 
 Hi Sunitha,
First and foremost, the DataSet API will be deprecated soon [1] so I would suggest trying to migrate to the DataStream API. When using the DataStream API it doesn't mean that you can not work with bounded inputs - you can. Flink SQL (Blink planner) is in fact using DataStream API to execute both streaming and batch queries. Maybe this path would be easier?
And about answering your question using the DataSet API - sorry, I don't know it :( I will try to ping someone who could help here.
Piotrek
[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
pon., 28 wrz 2020 o 15:14 s_penakalapati@yahoo.com <s_...@yahoo.com> napisał(a):

Hi All,
Need your help in Flink Batch processing: scenario described below:
we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.thresholds can be set by the owner for each vehicle. 
Say: we have 3 vehicles, threshold is set for 2 vehicles. Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15vehicle 2, threshold 35 hours, allowedPetrolConsumption=28vehicle 3  no threshold set by owner.
All the vehicle data is stored in HBase tables. We have a scheduled Batch Job every day at 12 pm to check the status of vehicle movement and Petrol consumption against threshold and raise an alert (vehicle1 did not move for past 20 hours, vehicle 2 consumed more petrol. )
Since it is a Batch Job, I loaded all threshold data in one DataSet and HBase Data in another Dataset using HbaseInputFormat.
What I am failing to figure out is:1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has threshold of 35 hours, I need to fetch data from Hbase for different scenario. Is there any better approach to get all data using one Hbase connection.2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed only on DataStream. Please help me with a simple example. (alert can be raised if count is zero or like petrol consumption is too high)

I could not get any example for Dataset on google where an alert is raised. Kindly guide me if there is any better approach
Regards,Sunitha.
  

Re: Flink Batch Processing

Posted by Piotr Nowojski <pn...@apache.org>.
Hi Sunitha,

First and foremost, the DataSet API will be deprecated soon [1] so I would
suggest trying to migrate to the DataStream API. When using the DataStream
API it doesn't mean that you can not work with bounded inputs - you can.
Flink SQL (Blink planner) is in fact using DataStream API to execute both
streaming and batch queries. Maybe this path would be easier?

And about answering your question using the DataSet API - sorry, I don't
know it :( I will try to ping someone who could help here.

Piotrek

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741

pon., 28 wrz 2020 o 15:14 s_penakalapati@yahoo.com <s_...@yahoo.com>
napisał(a):

> Hi All,
>
> Need your help in Flink Batch processing: scenario described below:
>
> we have multiple vehicles, we get data from each vehicle at a very high
> speed, 1 record per minute.
> thresholds can be set by the owner for each vehicle.
>
> Say: we have 3 vehicles, threshold is set for 2 vehicles.
> Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
> vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
> vehicle 3  no threshold set by owner.
>
> All the vehicle data is stored in HBase tables. We have a scheduled Batch
> Job every day at 12 pm to check the status of vehicle movement and Petrol
> consumption against threshold and raise an alert (vehicle1 did not move for
> past 20 hours, vehicle 2 consumed more petrol. )
>
> Since it is a Batch Job, I loaded all threshold data in one DataSet and
> HBase Data in another Dataset using HbaseInputFormat.
>
> What I am failing to figure out is:
> 1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has
> threshold of 35 hours, I need to fetch data from Hbase for different
> scenario. Is there any better approach to get all data using one Hbase
> connection.
> 2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed
> only on DataStream. Please help me with a simple example. (alert can be
> raised if count is zero or like petrol consumption is too high)
>
>
> I could not get any example for Dataset on google where an alert is
> raised. Kindly guide me if there is any better approach
>
> Regards,
> Sunitha.
>