You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Sabah Sajjad Khan <sa...@wayne.edu> on 2016/03/29 02:07:42 UTC

[selenium] running selenium headless

Hello,


I am new to nutch. I am trying to use the selenium plugin with nutch on a server for a school project but am unable to have a browser on the server. i have tried the headless setup but it does not seem to work for me when fetching i get the following error:


fetch of http://digikey.com/product-detail/en/fairchild-semiconductor/MDB6S/MDB6SFSTR-ND/3137082/ failed with: java.lang.RuntimeException: org.openqa.selenium.remote.UnreachableBrowserException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.

Build info: version: '2.42.2', revision: '6a6995d31c7c56c340d6f45a76976d43506cd6cc', time: '2014-06-03 10:52:47'

System info: host: '---', ip: '---', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-229.el7.x86_64', java.version: '1.7.0_79'

Driver info: driver.version: RemoteWebDriver

-finishing thread FetcherThread0, activeThreads=0

0/0 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues

-activeThreads=0

This is a snippet of the error that I'm getting. any help would be appreciated.

Thank You

Re: [selenium] running selenium headless

Posted by Karanjeet Singh <ka...@usc.edu>.
Hi Sabash,

Did you change the driver name from "RemoteDriver" to "firefox" in your
nutch-site.xml?

What version of firefox you are using?
ᐧ

Thanks & Regards,
Karanjeet Singh
CS Graduate Student
University of Southern California
karanjes@usc.edu | +1-213-675-9583

On Mon, Mar 28, 2016 at 5:29 PM, Sabah Sajjad Khan <sa...@wayne.edu>
wrote:

> I think my issue is I am unable to get a browser running in the server. I
> have tried installing Firefox but am unable to so I figured maybe that
> wasn't something you could do on a server! I guess I'm wrong though I will
> look into that.
>
>
> Thank You
>
> On Mar 28, 2016, at 8:24 PM, Sujen Shah <su...@gmail.com> wrote:
>
> Hi
>
> Can't get much info from the log you have pasted. Some Qs:
>
> Which browser are you using ?
> Have you tried running the browser alone on the server before running
> nutch ?
> Could you please attach the detailed logs from hadoop.log file ?
>
> Thanks.
>
>
>
>
>
> Regards,
> Sujen Shah
> M.S - Computer Science (Class of 2016)
> University of Southern California
> http://www.linkedin.com/in/sujenshah
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_sujenshah&d=CwMFAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=gkE93J9-xMS6HdsMUYq98lizwQsV-OkB1wUDb1mjiuA&s=l5_UAue5biqqp0CT87G7nKQyf26TOVcgiW8wkjLTu6A&e=>
>
> On Mon, Mar 28, 2016 at 5:07 PM, Sabah Sajjad Khan <sa...@wayne.edu>
> wrote:
>
>> Hello,
>>
>>
>> I am new to nutch. I am trying to use the selenium plugin with nutch on a
>> server for a school project but am unable to have a browser on the server.
>> i have tried the headless setup but it does not seem to work for me when
>> fetching i get the following error:
>>
>>
>> fetch of
>> http://digikey.com/product-detail/en/fairchild-semiconductor/MDB6S/MDB6SFSTR-ND/3137082/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__digikey.com_product-2Ddetail_en_fairchild-2Dsemiconductor_MDB6S_MDB6SFSTR-2DND_3137082_&d=CwMFAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=gkE93J9-xMS6HdsMUYq98lizwQsV-OkB1wUDb1mjiuA&s=YG_O5BjU3DGPUqSepGORYVVwi3iD22U-xtPhUpZQqFc&e=>
>> failed with: java.lang.RuntimeException:
>> org.openqa.selenium.remote.UnreachableBrowserException: Could not start a
>> new session. Possible causes are invalid address of the remote server or
>> browser start-up failure.
>>
>> Build info: version: '2.42.2', revision:
>> '6a6995d31c7c56c340d6f45a76976d43506cd6cc', time: '2014-06-03 10:52:47'
>>
>> System info: host: '---', ip: '---', os.name
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__os.name&d=CwMFAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=gkE93J9-xMS6HdsMUYq98lizwQsV-OkB1wUDb1mjiuA&s=-qXZGar7XVeugq-b8CPaLqaKX3v9-bFT8e3trGfJ5Tc&e=>:
>> 'Linux', os.arch: 'amd64', os.version: '3.10.0-229.el7.x86_64',
>> java.version: '1.7.0_79'
>>
>> Driver info: driver.version: RemoteWebDriver
>>
>> -finishing thread FetcherThread0, activeThreads=0
>>
>> 0/0 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 0
>> URLs in 0 queues
>>
>> -activeThreads=0
>>
>> This is a snippet of the error that I'm getting. any help would be
>> appreciated.
>>
>> Thank You
>>
>
>

Re: [selenium] running selenium headless

Posted by Sabah Sajjad Khan <sa...@wayne.edu>.
I think my issue is I am unable to get a browser running in the server. I have tried installing Firefox but am unable to so I figured maybe that wasn't something you could do on a server! I guess I'm wrong though I will look into that.


Thank You

On Mar 28, 2016, at 8:24 PM, Sujen Shah <su...@gmail.com>> wrote:

Hi

Can't get much info from the log you have pasted. Some Qs:

Which browser are you using ?
Have you tried running the browser alone on the server before running nutch ?
Could you please attach the detailed logs from hadoop.log file ?

Thanks.





Regards,
Sujen Shah
M.S - Computer Science (Class of 2016)
University of Southern California
http://www.linkedin.com/in/sujenshah

On Mon, Mar 28, 2016 at 5:07 PM, Sabah Sajjad Khan <sa...@wayne.edu>> wrote:

Hello,


I am new to nutch. I am trying to use the selenium plugin with nutch on a server for a school project but am unable to have a browser on the server. i have tried the headless setup but it does not seem to work for me when fetching i get the following error:


fetch of http://digikey.com/product-detail/en/fairchild-semiconductor/MDB6S/MDB6SFSTR-ND/3137082/ failed with: java.lang.RuntimeException: org.openqa.selenium.remote.UnreachableBrowserException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.

Build info: version: '2.42.2', revision: '6a6995d31c7c56c340d6f45a76976d43506cd6cc', time: '2014-06-03 10:52:47'

System info: host: '---', ip: '---', os.name<http://os.name>: 'Linux', os.arch: 'amd64', os.version: '3.10.0-229.el7.x86_64', java.version: '1.7.0_79'

Driver info: driver.version: RemoteWebDriver

-finishing thread FetcherThread0, activeThreads=0

0/0 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues

-activeThreads=0

This is a snippet of the error that I'm getting. any help would be appreciated.

Thank You


Re: [selenium] running selenium headless

Posted by Sujen Shah <su...@gmail.com>.
Hi

Can't get much info from the log you have pasted. Some Qs:

Which browser are you using ?
Have you tried running the browser alone on the server before running nutch
?
Could you please attach the detailed logs from hadoop.log file ?

Thanks.





Regards,
Sujen Shah
M.S - Computer Science (Class of 2016)
University of Southern California
http://www.linkedin.com/in/sujenshah

On Mon, Mar 28, 2016 at 5:07 PM, Sabah Sajjad Khan <sa...@wayne.edu>
wrote:

> Hello,
>
>
> I am new to nutch. I am trying to use the selenium plugin with nutch on a
> server for a school project but am unable to have a browser on the server.
> i have tried the headless setup but it does not seem to work for me when
> fetching i get the following error:
>
>
> fetch of
> http://digikey.com/product-detail/en/fairchild-semiconductor/MDB6S/MDB6SFSTR-ND/3137082/
> failed with: java.lang.RuntimeException:
> org.openqa.selenium.remote.UnreachableBrowserException: Could not start a
> new session. Possible causes are invalid address of the remote server or
> browser start-up failure.
>
> Build info: version: '2.42.2', revision:
> '6a6995d31c7c56c340d6f45a76976d43506cd6cc', time: '2014-06-03 10:52:47'
>
> System info: host: '---', ip: '---', os.name: 'Linux', os.arch: 'amd64',
> os.version: '3.10.0-229.el7.x86_64', java.version: '1.7.0_79'
>
> Driver info: driver.version: RemoteWebDriver
>
> -finishing thread FetcherThread0, activeThreads=0
>
> 0/0 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs
> in 0 queues
>
> -activeThreads=0
>
> This is a snippet of the error that I'm getting. any help would be
> appreciated.
>
> Thank You
>

Fw: [selenium] running selenium headless

Posted by Sabah Sajjad Khan <sa...@wayne.edu>.


________________________________
From: Sabah Sajjad Khan
Sent: Monday, March 28, 2016 8:07 PM
To: dev@nutch.apache.org
Subject: [selenium] running selenium headless


Hello,


I am new to nutch. I am trying to use the selenium plugin with nutch on a server for a school project but am unable to have a browser on the server. i have tried the headless setup but it does not seem to work for me when fetching i get the following error:


fetch of http://digikey.com/product-detail/en/fairchild-semiconductor/MDB6S/MDB6SFSTR-ND/3137082/ failed with: java.lang.RuntimeException: org.openqa.selenium.remote.UnreachableBrowserException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.

Build info: version: '2.42.2', revision: '6a6995d31c7c56c340d6f45a76976d43506cd6cc', time: '2014-06-03 10:52:47'

System info: host: '---', ip: '---', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-229.el7.x86_64', java.version: '1.7.0_79'

Driver info: driver.version: RemoteWebDriver

-finishing thread FetcherThread0, activeThreads=0

0/0 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues

-activeThreads=0

This is a snippet of the error that I'm getting. any help would be appreciated.

Thank You

Fw: [selenium] running selenium headless

Posted by Sabah Sajjad Khan <sa...@wayne.edu>.


________________________________
From: Sabah Sajjad Khan
Sent: Monday, March 28, 2016 8:07 PM
To: dev@nutch.apache.org
Subject: [selenium] running selenium headless


Hello,


I am new to nutch. I am trying to use the selenium plugin with nutch on a server for a school project but am unable to have a browser on the server. i have tried the headless setup but it does not seem to work for me when fetching i get the following error:


fetch of http://digikey.com/product-detail/en/fairchild-semiconductor/MDB6S/MDB6SFSTR-ND/3137082/ failed with: java.lang.RuntimeException: org.openqa.selenium.remote.UnreachableBrowserException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.

Build info: version: '2.42.2', revision: '6a6995d31c7c56c340d6f45a76976d43506cd6cc', time: '2014-06-03 10:52:47'

System info: host: '---', ip: '---', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-229.el7.x86_64', java.version: '1.7.0_79'

Driver info: driver.version: RemoteWebDriver

-finishing thread FetcherThread0, activeThreads=0

0/0 spinwaiting/active, 1 pages, 1 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues

-activeThreads=0

This is a snippet of the error that I'm getting. any help would be appreciated.

Thank You