You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John McGibbney <le...@apache.org> on 2021/07/01 20:06:20 UTC

Re: Crawling pages behind SSO authentication (SAML/OIDC)

Hi Abhay,

On 2021/06/10 22:27:42, Abhay Ratnaparkhi <ab...@gmail.com> wrote: 

> 
> Based on selenium I created a microservice (which handles all required SSO
> redirections/ OTP handlings etc) and hosted that with a selenium grid in
> the kubernetes cluster for scaling.
> I found that we couldn't scale this approach beyond a certain point and the
> selenium hub in the selenium grid can not be scaled horizontally.

Which version of Selenium Grid and Hub did you use?
I haven't used either for a while... I did see that Grid 4 is available
https://www.selenium.dev/documentation/en/grid/grid_4/

lewismc

Re: Crawling pages behind SSO authentication (SAML/OIDC)

Posted by Lewis John McGibbney <le...@apache.org>.
OK, I'm going to try out Selenium Grid 4 and record my experience in a wiki page.
I'll write back here in due course.
Thanks

On 2021/07/08 17:11:56, Abhay Ratnaparkhi <ab...@gmail.com> wrote: 
> Hello Lewis,
> 
> Sorry for the late reply, I missed your email.
> The version we used is 3.141.59. As I mentioned earlier, we moved to using
> puppeteer instead of selenium.
> 
> 
> Thank you
> ~Abhay
> 
> 
> Below was the hub configuration.
> 
> 
> ```
> hub:
> image: "selenium/hub"
> tag: "3.141.59"
> port: 4444
> servicePort: 4444
> readinessTimeout: 40
> readinessDelay: 40
> livenessTimeout: 160
> javaOpts: "-Xmx8192m"
> resources:
> limits:
> cpu: "7"
> memory: "9Gi"
> gridNewSessionWaitTimeout: -1
> gridJettyMaxThreads: 750
> gridNodePolling: 10000
> gridCleanUpCycle: 5000
> gridTimeout: 360
> gridBrowserTimeout: 120
> gridMaxSession: 5
> gridUnregisterIfStillDownAfter: 600000
> chrome:
> enabled: true
> image: "selenium/node-chrome"
> tag: "3.141.59"
> replicas: 60
> nodeMaxSession: 5
> nodeRegistryCycle: 5000
> javaOpts: "-Xmx2048m"
> resources:
> limits:
> cpu: "1200m"
> memory: "3000Mi"
> 
> On Thu, Jul 1, 2021 at 3:06 PM Lewis John McGibbney <le...@apache.org>
> wrote:
> 
> > Hi Abhay,
> >
> > On 2021/06/10 22:27:42, Abhay Ratnaparkhi <ab...@gmail.com>
> > wrote:
> >
> > >
> > > Based on selenium I created a microservice (which handles all required
> > SSO
> > > redirections/ OTP handlings etc) and hosted that with a selenium grid in
> > > the kubernetes cluster for scaling.
> > > I found that we couldn't scale this approach beyond a certain point and
> > the
> > > selenium hub in the selenium grid can not be scaled horizontally.
> >
> > Which version of Selenium Grid and Hub did you use?
> > I haven't used either for a while... I did see that Grid 4 is available
> > https://www.selenium.dev/documentation/en/grid/grid_4/
> >
> > lewismc
> >
> 

Re: Crawling pages behind SSO authentication (SAML/OIDC)

Posted by Abhay Ratnaparkhi <ab...@gmail.com>.
Hello Lewis,

Sorry for the late reply, I missed your email.
The version we used is 3.141.59. As I mentioned earlier, we moved to using
puppeteer instead of selenium.


Thank you
~Abhay


Below was the hub configuration.


```
hub:
image: "selenium/hub"
tag: "3.141.59"
port: 4444
servicePort: 4444
readinessTimeout: 40
readinessDelay: 40
livenessTimeout: 160
javaOpts: "-Xmx8192m"
resources:
limits:
cpu: "7"
memory: "9Gi"
gridNewSessionWaitTimeout: -1
gridJettyMaxThreads: 750
gridNodePolling: 10000
gridCleanUpCycle: 5000
gridTimeout: 360
gridBrowserTimeout: 120
gridMaxSession: 5
gridUnregisterIfStillDownAfter: 600000
chrome:
enabled: true
image: "selenium/node-chrome"
tag: "3.141.59"
replicas: 60
nodeMaxSession: 5
nodeRegistryCycle: 5000
javaOpts: "-Xmx2048m"
resources:
limits:
cpu: "1200m"
memory: "3000Mi"

On Thu, Jul 1, 2021 at 3:06 PM Lewis John McGibbney <le...@apache.org>
wrote:

> Hi Abhay,
>
> On 2021/06/10 22:27:42, Abhay Ratnaparkhi <ab...@gmail.com>
> wrote:
>
> >
> > Based on selenium I created a microservice (which handles all required
> SSO
> > redirections/ OTP handlings etc) and hosted that with a selenium grid in
> > the kubernetes cluster for scaling.
> > I found that we couldn't scale this approach beyond a certain point and
> the
> > selenium hub in the selenium grid can not be scaled horizontally.
>
> Which version of Selenium Grid and Hub did you use?
> I haven't used either for a while... I did see that Grid 4 is available
> https://www.selenium.dev/documentation/en/grid/grid_4/
>
> lewismc
>