You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by tkg_cangkul <yu...@gmail.com> on 2020/09/02 18:05:59 UTC

scraping aspx web

Dear All,

I wanna try to scrapping aspx web with nifi. is there any suggestion to 
convert aspx grid into html table or csv file ?

Below is the sample aspx grid view format that i've got



Is this possible to do with nifi?
Need advice.


Best Regards,

Re: NIFI HandleHttpRequest API - Health Check when API or Node Down

Posted by jgunvaldson <jg...@cox.net>.
I think that is part of the problem, and probably part of the solution. 

If I can wire load balancer to do a HTTP HEAD (a property of the HandleHttpRequest) - knowing that this is a “Separate” Jetty instance means I should get a 200 from the HEAD

Something like that



> On Sep 4, 2020, at 9:56 AM, Bryan Bende <bb...@gmail.com> wrote:
> 
> That is correct. Each instance of HandleHttpRequest and ListenHttp have their own embedded Jetty server that is separate from the Jetty that is running NiFi's REST API.
> 


Re: NIFI HandleHttpRequest API - Health Check when API or Node Down

Posted by Bryan Bende <bb...@gmail.com>.
That is correct. Each instance of HandleHttpRequest and ListenHttp have
their own embedded Jetty server that is separate from the Jetty that is
running NiFi's REST API.

On Fri, Sep 4, 2020 at 12:40 PM Etienne Jouvin <la...@gmail.com>
wrote:

> I do not know everything, but if I well understood NiFi is based on REST
> API. For example, all you do on the GUI is done throw REST call.
> So I guess you can request if the NiFi instance is up on each node.
>
> But this will not give you the status of your custom HttpHandle. NiFi
> instance can be up, but your processor stopped or disabled.
>
>
>
>
> Le ven. 4 sept. 2020 à 18:26, jgunvaldson <jg...@cox.net> a écrit :
>
>> It seems a bit like a chicken and egg thing. Using ‘anything’ configured
>> on the disconnected node as a health check, is not unlike trying to get to
>> the API (listening port) itself? Kinda.
>>
>> Anyway
>>
>> I was hoping that the NIFI infrastructure had a generalized, centralized
>> (REST API?  or other) that would give me the answer is this NODE up and
>> listening on this PORT, and that it could be called by a Load Balancer?
>>
>> ~John
>>
>>
>>
>> On Sep 4, 2020, at 9:19 AM, Etienne Jouvin <la...@gmail.com>
>> wrote:
>>
>> Because you implemented a HandleHttpRequest listing, why don't you
>> configure an handle on something like http(s)://server/ping
>> And the response is just pong
>>
>>
>>
>> Le ven. 4 sept. 2020 à 18:02, jgunvaldson <jg...@cox.net> a écrit :
>>
>>> Hi,
>>>
>>> Our network administrators are unable to wire up advanced Load Balancer
>>> (AWS Application Load Balancer) or (Apache reverse proxy) to leverage a
>>> NIFI API that may be listening on a port across several nodes.
>>>
>>> For instance, a HandleHttpRequest listing on Node-1 on PORT 5112, Node-2
>>> on 5112, Node-3 on 5112, and so on and so forth…
>>>
>>> In an event where a NODE is down (or API stops listening, it happens),
>>> or disconnected, a call to that Node and PORT will fail and be a pretty bad
>>> experience for the customer
>>>
>>> So
>>>
>>> What we would like to have is an external Load Balancer be able to use
>>> Round Robin (Advanced Features) to redirect the request to an UP Node, but
>>> to do this the Load Balancer needs a proper health check.
>>>
>>> What is a proper “Health Check” for this scenario? How would it be
>>> created and wired up?
>>>
>>> Right now, an API requested that is hosted on NIFI that is proxied by
>>> our API Manager (WSO2) will fail on the down NODE and not recover - user
>>> will probably get a 500. APIM is not a good load balancer.
>>>
>>> Thanks in advance for this discussion
>>>
>>>
>>> Best Regards
>>> John Gunvaldson
>>>
>>>
>>

Re: NIFI HandleHttpRequest API - Health Check when API or Node Down

Posted by Etienne Jouvin <la...@gmail.com>.
I do not know everything, but if I well understood NiFi is based on REST
API. For example, all you do on the GUI is done throw REST call.
So I guess you can request if the NiFi instance is up on each node.

But this will not give you the status of your custom HttpHandle. NiFi
instance can be up, but your processor stopped or disabled.




Le ven. 4 sept. 2020 à 18:26, jgunvaldson <jg...@cox.net> a écrit :

> It seems a bit like a chicken and egg thing. Using ‘anything’ configured
> on the disconnected node as a health check, is not unlike trying to get to
> the API (listening port) itself? Kinda.
>
> Anyway
>
> I was hoping that the NIFI infrastructure had a generalized, centralized
> (REST API?  or other) that would give me the answer is this NODE up and
> listening on this PORT, and that it could be called by a Load Balancer?
>
> ~John
>
>
>
> On Sep 4, 2020, at 9:19 AM, Etienne Jouvin <la...@gmail.com>
> wrote:
>
> Because you implemented a HandleHttpRequest listing, why don't you
> configure an handle on something like http(s)://server/ping
> And the response is just pong
>
>
>
> Le ven. 4 sept. 2020 à 18:02, jgunvaldson <jg...@cox.net> a écrit :
>
>> Hi,
>>
>> Our network administrators are unable to wire up advanced Load Balancer
>> (AWS Application Load Balancer) or (Apache reverse proxy) to leverage a
>> NIFI API that may be listening on a port across several nodes.
>>
>> For instance, a HandleHttpRequest listing on Node-1 on PORT 5112, Node-2
>> on 5112, Node-3 on 5112, and so on and so forth…
>>
>> In an event where a NODE is down (or API stops listening, it happens), or
>> disconnected, a call to that Node and PORT will fail and be a pretty bad
>> experience for the customer
>>
>> So
>>
>> What we would like to have is an external Load Balancer be able to use
>> Round Robin (Advanced Features) to redirect the request to an UP Node, but
>> to do this the Load Balancer needs a proper health check.
>>
>> What is a proper “Health Check” for this scenario? How would it be
>> created and wired up?
>>
>> Right now, an API requested that is hosted on NIFI that is proxied by our
>> API Manager (WSO2) will fail on the down NODE and not recover - user will
>> probably get a 500. APIM is not a good load balancer.
>>
>> Thanks in advance for this discussion
>>
>>
>> Best Regards
>> John Gunvaldson
>>
>>
>

Re: NIFI HandleHttpRequest API - Health Check when API or Node Down

Posted by Andrew Grande <ap...@gmail.com>.
You can always hit NiFi API status rest endpoint. It won't give you any
idea about that specific http endpoint you exposed, though, as it is a
general nifi rest api.

Your LB would need to understand how to hit this URL too, especially if
it's secured. Coming back to the easiest path, you'd rather implement a
standard integration pattern for thr custom endpoint and filter out any GET
requests which come through to the /path/ping as an example. If it fails,
LB knows the endpoint is dead, if it returns 200, it's live, and your nifi
flow would simply terminate any requests for the status check path.

If you were asking about having a real time integrated system where a LB
would be able to route ONLY to healthy nodes and maintain and discover that
list - I don't think you can do it with the aws LB, at least not if you
have a full control over it and can drive it with APIs.

Andrew

On Fri, Sep 4, 2020, 9:26 AM jgunvaldson <jg...@cox.net> wrote:

> It seems a bit like a chicken and egg thing. Using ‘anything’ configured
> on the disconnected node as a health check, is not unlike trying to get to
> the API (listening port) itself? Kinda.
>
> Anyway
>
> I was hoping that the NIFI infrastructure had a generalized, centralized
> (REST API?  or other) that would give me the answer is this NODE up and
> listening on this PORT, and that it could be called by a Load Balancer?
>
> ~John
>
>
>
> On Sep 4, 2020, at 9:19 AM, Etienne Jouvin <la...@gmail.com>
> wrote:
>
> Because you implemented a HandleHttpRequest listing, why don't you
> configure an handle on something like http(s)://server/ping
> And the response is just pong
>
>
>
> Le ven. 4 sept. 2020 à 18:02, jgunvaldson <jg...@cox.net> a écrit :
>
>> Hi,
>>
>> Our network administrators are unable to wire up advanced Load Balancer
>> (AWS Application Load Balancer) or (Apache reverse proxy) to leverage a
>> NIFI API that may be listening on a port across several nodes.
>>
>> For instance, a HandleHttpRequest listing on Node-1 on PORT 5112, Node-2
>> on 5112, Node-3 on 5112, and so on and so forth…
>>
>> In an event where a NODE is down (or API stops listening, it happens), or
>> disconnected, a call to that Node and PORT will fail and be a pretty bad
>> experience for the customer
>>
>> So
>>
>> What we would like to have is an external Load Balancer be able to use
>> Round Robin (Advanced Features) to redirect the request to an UP Node, but
>> to do this the Load Balancer needs a proper health check.
>>
>> What is a proper “Health Check” for this scenario? How would it be
>> created and wired up?
>>
>> Right now, an API requested that is hosted on NIFI that is proxied by our
>> API Manager (WSO2) will fail on the down NODE and not recover - user will
>> probably get a 500. APIM is not a good load balancer.
>>
>> Thanks in advance for this discussion
>>
>>
>> Best Regards
>> John Gunvaldson
>>
>>
>

Re: NIFI HandleHttpRequest API - Health Check when API or Node Down

Posted by jgunvaldson <jg...@cox.net>.
It seems a bit like a chicken and egg thing. Using ‘anything’ configured on the disconnected node as a health check, is not unlike trying to get to the API (listening port) itself? Kinda.

Anyway

I was hoping that the NIFI infrastructure had a generalized, centralized (REST API?  or other) that would give me the answer is this NODE up and listening on this PORT, and that it could be called by a Load Balancer?

~John



> On Sep 4, 2020, at 9:19 AM, Etienne Jouvin <la...@gmail.com> wrote:
> 
> Because you implemented a HandleHttpRequest listing, why don't you configure an handle on something like http(s)://server/ping
> And the response is just pong
> 
> 
> 
> Le ven. 4 sept. 2020 à 18:02, jgunvaldson <jgunvaldson@cox.net <ma...@cox.net>> a écrit :
> Hi,
> 
> Our network administrators are unable to wire up advanced Load Balancer (AWS Application Load Balancer) or (Apache reverse proxy) to leverage a NIFI API that may be listening on a port across several nodes.
> 
> For instance, a HandleHttpRequest listing on Node-1 on PORT 5112, Node-2 on 5112, Node-3 on 5112, and so on and so forth…
> 
> In an event where a NODE is down (or API stops listening, it happens), or disconnected, a call to that Node and PORT will fail and be a pretty bad experience for the customer
> 
> So
> 
> What we would like to have is an external Load Balancer be able to use Round Robin (Advanced Features) to redirect the request to an UP Node, but to do this the Load Balancer needs a proper health check.
> 
> What is a proper “Health Check” for this scenario? How would it be created and wired up?
> 
> Right now, an API requested that is hosted on NIFI that is proxied by our API Manager (WSO2) will fail on the down NODE and not recover - user will probably get a 500. APIM is not a good load balancer.
> 
> Thanks in advance for this discussion
> 
> 
> Best Regards
> John Gunvaldson
> 


Re: NIFI HandleHttpRequest API - Health Check when API or Node Down

Posted by Etienne Jouvin <la...@gmail.com>.
Because you implemented a HandleHttpRequest listing, why don't you
configure an handle on something like http(s)://server/ping
And the response is just pong



Le ven. 4 sept. 2020 à 18:02, jgunvaldson <jg...@cox.net> a écrit :

> Hi,
>
> Our network administrators are unable to wire up advanced Load Balancer
> (AWS Application Load Balancer) or (Apache reverse proxy) to leverage a
> NIFI API that may be listening on a port across several nodes.
>
> For instance, a HandleHttpRequest listing on Node-1 on PORT 5112, Node-2
> on 5112, Node-3 on 5112, and so on and so forth…
>
> In an event where a NODE is down (or API stops listening, it happens), or
> disconnected, a call to that Node and PORT will fail and be a pretty bad
> experience for the customer
>
> So
>
> What we would like to have is an external Load Balancer be able to use
> Round Robin (Advanced Features) to redirect the request to an UP Node, but
> to do this the Load Balancer needs a proper health check.
>
> What is a proper “Health Check” for this scenario? How would it be created
> and wired up?
>
> Right now, an API requested that is hosted on NIFI that is proxied by our
> API Manager (WSO2) will fail on the down NODE and not recover - user will
> probably get a 500. APIM is not a good load balancer.
>
> Thanks in advance for this discussion
>
>
> Best Regards
> John Gunvaldson
>
>

NIFI HandleHttpRequest API - Health Check when API or Node Down

Posted by jgunvaldson <jg...@cox.net>.
Hi,

Our network administrators are unable to wire up advanced Load Balancer (AWS Application Load Balancer) or (Apache reverse proxy) to leverage a NIFI API that may be listening on a port across several nodes.

For instance, a HandleHttpRequest listing on Node-1 on PORT 5112, Node-2 on 5112, Node-3 on 5112, and so on and so forth…

In an event where a NODE is down (or API stops listening, it happens), or disconnected, a call to that Node and PORT will fail and be a pretty bad experience for the customer

So

What we would like to have is an external Load Balancer be able to use Round Robin (Advanced Features) to redirect the request to an UP Node, but to do this the Load Balancer needs a proper health check.

What is a proper “Health Check” for this scenario? How would it be created and wired up?

Right now, an API requested that is hosted on NIFI that is proxied by our API Manager (WSO2) will fail on the down NODE and not recover - user will probably get a 500. APIM is not a good load balancer.

Thanks in advance for this discussion


Best Regards
John Gunvaldson


Re: scraping aspx web

Posted by Mike Thomsen <mi...@gmail.com>.
You're better off with a tool like Scrapy for something like this:
https://scrapy.org/

On Wed, Sep 2, 2020 at 2:07 PM tkg_cangkul <yu...@gmail.com> wrote:

> Dear All,
>
> I wanna try to scrapping aspx web with nifi. is there any suggestion to
> convert aspx grid into html table or csv file ?
>
> Below is the sample aspx grid view format that i've got
>
>
>
> Is this possible to do with nifi?
> Need advice.
>
>
> Best Regards,
>