You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Shuo Li <sl...@usc.edu> on 2015/02/11 06:36:46 UTC

Nutch-Selenium in Nutch 1.10

Yop,

I'm trying to install selenium in Nutch 1.10. However, this error pops out:

*error: package org.apache.nutch.storage does not exist*

I can only find this package in Nutch 2.x. Is there a way to use Selenium
in 1.10?

Any advice would be appreciated.

Regards,
Shuo Li

Re: Nutch-Selenium in Nutch 1.10

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Perfect, that’s what I suggested, thanks guys!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Sapnashri Suresh <sa...@usc.edu>
Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Date: Tuesday, February 10, 2015 at 9:42 PM
To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Subject: Re: Nutch-Selenium in Nutch 1.10

>Hi Shuo Li,
>
>
>We were facing a similar issue. Prof. Mattman suggested we look into this
>patch for Selenium on Nutch 1.10 :
>https://issues.apache.org/jira/browse/NUTCH-1933.
>
>
>Hope this helps!
>
>
>Thanks,
>Sapna
>
>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
><sl...@usc.edu> wrote:
>
>Yop,
>
>
>I'm trying to install selenium in Nutch 1.10. However, this error pops
>out:
>
>
>error: package org.apache.nutch.storage does not exist
>
>
>
>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>in 1.10? 
>
>
>Any advice would be appreciated.
>
>
>Regards,
>Shuo Li
>
>
>
>
>
>
>
>
>-- 
>Graduate Student
>MS in CS (Data Science)
>Viterbi School of Engineering
>University of Southern California
>
>
>Phone: +1 650-307-9848
>
>
>
>
>


Re: Nutch-Selenium in Nutch 1.10

Posted by Jiaxin Ye <ji...@usc.edu>.
Update:

if xvfb -screen scrn 1024x758x34 doesn't work
try xvfb :11 -screen 0 1024x768x24


On Thu, Feb 19, 2015 at 1:25 AM, Jaydeep Bagrecha <ba...@usc.edu> wrote:

> Update:
>
>  selenium latest version 2.44.0 doesn’t seem to work with firefox latest
> version(35),so I installed firefox version 29 and it’s crawling properly
> now.
>
> On Feb 18, 2015, at 2:56 PM, Jaydeep Bagrecha <ba...@usc.edu> wrote:
>
> thanks Jiaxin!
>
> I again repeated the entire installation procedure and I think i have
> installed it correctly.(it said BUILD SUCCESSFUL after ant runtime command
> and has selenium jar files in runtime/local/lib folder)
>
> *When i started crawling the mozilla browser popped 2 times,but when i saw
> crawl statistics,it had fetched no urls(*Did anyone have this problem?)
>
> I had following error while crawling:-
>
> *org.openqa.selenium.firefox.NotConnectedException: Unable to connect to
> host 127.0.0.1 on port 7055 after 45000 ms. Firefox console output:*
> *h changes to installed add-ons*
> 1424295898279 addons.xpi-utils DEBUG Updating add-on states
> 1424295898281 addons.xpi-utils DEBUG Writing add-ons list
> 1424295898291 addons.manager DEBUG Registering shutdown blocker for
> XPIProvider
> 1424295898292 addons.manager DEBUG Registering shutdown blocker for
> LightweightThemeManager
> 1424295898295 addons.manager DEBUG Registering shutdown blocker for
> OpenH264Provider
> 1424295898296 addons.manager DEBUG Registering shutdown blocker for
> PluginProvider
> 1424295898775 DeferredSave.extensions.json DEBUG Starting timer
> 1424295898800 DeferredSave.extensions.json DEBUG Starting write
> 1424295898858 addons.manager DEBUG shutdown
> 1424295898859 addons.manager DEBUG Calling shutdown blocker for
> XPIProvider
> 1424295898859 addons.xpi DEBUG shutdown
> 1424295898860 addons.xpi-utils DEBUG shutdown
> 1424295898861 addons.manager DEBUG Calling shutdown blocker for
> LightweightThemeManager
> 1424295898862 addons.manager DEBUG Calling shutdown blocker for
> OpenH264Provider
> 1424295898864 addons.manager DEBUG Calling shutdown blocker for
> PluginProvider
> 1424295899016 DeferredSave.extensions.json DEBUG Write succeeded
> 1424295899016 addons.xpi-utils DEBUG XPI Database saved, setting schema
> version preference to 16
> 1424295899017 addons.xpi DEBUG Notifying XPI shutdown observers
> 1424295899025 addons.manager DEBUG Async provider shutdown done
> 1424295900455 addons.manager DEBUG Loaded provider scope for
> resource://gre/modules/addons/XPIProvider.jsm: ["XPIProvider"]
> 1424295900459 addons.manager DEBUG Loaded provider scope for
> resource://gre/modules/LightweightThemeManager.jsm:
> ["LightweightThemeManager"]
> 1424295900468 addons.xpi DEBUG startup
> 1424295900470 addons.xpi INFO Mapping fxdriver@googlecode.com to /
> var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com
> 1424295900471 addons.xpi DEBUG Ignoring file entry whose name is not a
> valid add-on ID:
> /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging
> 1424295900472 addons.xpi INFO Mapping
> {972ce4c6-7e08-4474-a285-3208198ce6fd} to
> /Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900473 addons.xpi DEBUG Skipping unavailable install location
> app-system-share
> 1424295900475 addons.xpi DEBUG checkForChanges
> 1424295900476 addons.xpi DEBUG Loaded add-on state from prefs:
> {"app-profile":{"fxdriver@googlecode.com":{"d":"/
> var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com
> ","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900480 addons.xpi DEBUG getModTime: Recursive scan of
> {972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900483 addons.xpi DEBUG getInstallState changed: false, state:
> {"app-profile":{"fxdriver@googlecode.com":{"d":"/
> var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com
> ","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900488 addons.xpi DEBUG No changes found
> 1424295900502 addons.manager DEBUG Registering shutdown blocker for
> XPIProvider
> 1424295900504 addons.manager DEBUG Registering shutdown blocker for
> LightweightThemeManager
> 1424295900507 addons.manager DEBUG Registering shutdown blocker for
> OpenH264Provider
> 1424295900508 addons.manager DEBUG Registering shutdown blocker for
> PluginProvider
> *** Blocklist::_preloadBlocklistFile: blocklist is disabled
> 1424295903113 addons.manager DEBUG Registering shutdown blocker for
> <unnamed-provider>
>
> at
> org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118)
> at
> org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246)
> at
> org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95)
> at
> org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53)
> at
> org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199)
> at
> org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161)
> at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
> at
> org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101)
> at
> org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151)
> at
> org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722)
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0,
> fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0,
> fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0,
> fetchQueues.getQueueCount=1
>
> On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <ji...@usc.edu> wrote:
>
> Hi,
>
> When you install the patch, did you see any fails? No fail is tolerated. I
> am guessing there is something wrong with ivy.xml. I am suggesting that checkout ALL
> files in Nutch and then try it again.
>
> Best,
> Jiaxin
>
> On Tuesday, February 17, 2015, Jaydeep Bagrecha <ba...@usc.edu> wrote:
>
>> Hi all,
>> I am trying to install and build selenium with nutch1.10 on Mac Yosemite.
>>
>>  having following error after downloading selenium patch(
>> https://issues.apache.org/jira/browse/NUTCH-1933) and while using “ant
>> runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?
>>
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.By
>> <http://org.openqa.selenium.by/>;
>>     [javac]                           ^
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.WebDriver;
>>     [javac]                           ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxDriver;
>>     [javac]                                   ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxProfile;
>> error: cannot find symbol
>>     [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new
>> ThreadLocal<WebDriver>() {
>>     [javac]                             ^
>>     [javac]   symbol:   class WebDriver
>>     [javac]   location: class HttpWebClient
>>  error: cannot find symbol
>>     [javac]     protected WebDriver initialValue()
>>     [javac]               ^
>>     [javac]   symbol: class WebDriver
>>  error: cannot find symbol
>>     [javac]       FirefoxProfile profile = new FirefoxProfile();
>>     [javac]       ^
>>     [javac]   symbol: class FirefoxProfile
>> error: cannot find symbol
>>     [javac]       WebDriver driver = new FirefoxDriver(profile);
>>     [javac]                              ^
>>     [javac]   symbol: class FirefoxDriver
>>  error: cannot find symbol
>>     [javac]       driver = new FirefoxDriver();
>>     [javac]                    ^
>>     [javac]   symbol:   class FirefoxDriver
>>     [javac]   location: class HttpWebClient
>>
>>  error: cannot find symbol
>>     [javac]       new WebDriverWait(driver, 3);
>>     [javac]           ^
>>     [javac]   symbol:   class WebDriverWait
>>     [javac]   location: class HttpWebClient
>>
>>  error: cannot find symbol
>>     [javac]       String innerHtml =
>> driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>>     [javac]                                             ^
>>     [javac]   symbol:   variable By
>>     [javac]   location: class HttpWebClient
>>
>> Thanks,
>> Jaydeep
>>
>> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <ji...@usc.edu> wrote:
>>
>> Sure. I will do it once I confirm it works...
>>
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>>> wiki that has this information?
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Jiaxin Ye <ji...@usc.edu>
>>> Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>>> Date: Thursday, February 12, 2015 at 9:39 PM
>>> To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>>> Subject: Nutch-Selenium in Nutch 1.10
>>>
>>> >Hi Li, Shuo. You are so right. I finished installing and successfully
>>> run
>>> >the butch with selenium and Firefox. I have a question though, does your
>>> >Firefox plug out for always all the urls we crawled?
>>> >
>>> >
>>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC
>>> with
>>> >OS higher than 10.6 I think...
>>> >
>>> >
>>> >1. Download XQuatz, it's a dmp file, install it directly
>>> >2. Download Nutch 1.10
>>> >3. Download the patch and put it on the Nutch project directory
>>> >4. patch -p0 < THE PATCH NAME
>>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>>> >in the github told you. The patch basically updated those .xml file for
>>> >us. And the patch also installs lib-selenium and protocol selenium for
>>> us
>>> >(Correct me if
>>> > I am wrong)
>>> >6. Update tika dependency if needed
>>> >7. Go to the Nutch project directory and run ant runtime
>>> >8. Download Firefox
>>> >9. Open a new terminal and type
>>> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>>> >want...)
>>> >    There should be some errors after entering the command (for me at
>>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>>> >mode to 1777. Rerun the command. xvfb should be working.
>>> >10. Go to nutch > runtime > local and run the crawling command
>>> >
>>> >
>>> >Hope it helps. :)
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>>> ><sli491@usc.edu <javascript:_e(%7B%7D,'cvml','sli491@usc.edu');>>
>>> wrote:
>>> >
>>> >I think I have possibly finished installing.
>>> >
>>> >
>>> >What you need to do:
>>> >0. git status and checkout what you have modified.
>>> >1. patch -p0 < YOUR_PATCH_FILE
>>> >2. ant clean jar
>>> >3. ant runtime
>>> >
>>> >
>>> >Will try crawling using selenium later on. Hope this helped. >_<
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>>> ><chris.a.mattmann@jpl.nasa.gov
>>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>>> >
>>> >Yes I believe you need to install X11 - why don't you try and report
>>> back
>>> >what you find thanks.
>>> >
>>> >Sent from my iPhone
>>> >
>>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu
>>> ><javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu');>> wrote:
>>> >
>>> >
>>> >
>>> >Hi professor, but can we use Selenium on Mac?
>>> >
>>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>>> ><chris.a.mattmann@jpl.nasa.gov
>>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>>> >
>>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>>> >polar dataset you have been assigned in my CSCI 572 search engines
>>> class.
>>> >
>>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>>> >are here:
>>> >
>>> >https://issues.apache.org/jira/browse/NUTCH-1933
>>> >
>>> >
>>> >Cheers,
>>> >Chris
>>> >
>>> >
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Chris Mattmann, Ph.D.
>>> >Chief Architect
>>> >Instrument Software and Science Data Systems Section (398)
>>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >Office: 168-519, Mailstop: 168-527
>>> >Email: chris.a.mattmann@nasa.gov
>>> >WWW:  http://sunset.usc.edu/~mattmann/
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Adjunct Associate Professor, Computer Science Department
>>> >University of Southern California, Los Angeles, CA 90089 USA
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >-----Original Message-----
>>> >From: Jiaxin Ye <ji...@usc.edu>
>>> >Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>>> >Date: Thursday, February 12, 2015 at 12:46 AM
>>> >To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>>> >
>>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is
>>> why
>>> >>do we need Selenium anyway? Just easier to perform crawling?
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>>> >><sl...@usc.edu> wrote:
>>> >>
>>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so
>>> I'm
>>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can
>>> still
>>> >>be installed properly. The issue would be I don't know how to integrate
>>> >>Selenium with Nutch 1.10.
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>>> >><ji...@usc.edu> wrote:
>>> >>
>>> >>Hi all,
>>> >>
>>> >>
>>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac
>>> ??
>>> >>I find it difficult to install Xvfb on mac.
>>> >>
>>> >>
>>> >>Best,
>>> >>Jiaxin
>>> >>
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>>> >><sa...@usc.edu> wrote:
>>> >>
>>> >>Hi Shuo Li,
>>> >>
>>> >>
>>> >>We were facing a similar issue. Prof. Mattman suggested we look into
>>> this
>>> >>patch for Selenium on Nutch 1.10 :
>>> >>https://issues.apache.org/jira/browse/NUTCH-1933.
>>> >>
>>> >>
>>> >>Hope this helps!
>>> >>
>>> >>
>>> >>Thanks,
>>> >>Sapna
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>>> >><sl...@usc.edu> wrote:
>>> >>
>>> >>Yop,
>>> >>
>>> >>
>>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> >>out:
>>> >>
>>> >>
>>> >>error: package org.apache.nutch.storage does not exist
>>> >>
>>> >>
>>> >>
>>> >>I can only find this package in Nutch 2.x. Is there a way to use
>>> Selenium
>>> >>in 1.10?
>>> >>
>>> >>
>>> >>Any advice would be appreciated.
>>> >>
>>> >>
>>> >>Regards,
>>> >>Shuo Li
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>--
>>> >>Graduate Student
>>> >>MS in CS (Data Science)
>>> >>Viterbi School of Engineering
>>> >>University of Southern California
>>> >>
>>> >>
>>> >>Phone:
>>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>
>

Re: Nutch-Selenium in Nutch 1.10

Posted by Jaydeep Bagrecha <ba...@usc.edu>.
Update:

 selenium latest version 2.44.0 doesn’t seem to work with firefox latest version(35),so I installed firefox version 29 and it’s crawling properly now.
> On Feb 18, 2015, at 2:56 PM, Jaydeep Bagrecha <ba...@usc.edu> wrote:
> 
> thanks Jiaxin!
> 
> I again repeated the entire installation procedure and I think i have installed it correctly.(it said BUILD SUCCESSFUL after ant runtime command and has selenium jar files in runtime/local/lib folder)
> 
> When i started crawling the mozilla browser popped 2 times,but when i saw crawl statistics,it had fetched no urls(Did anyone have this problem?)
> 
> I had following error while crawling:-
> 
> org.openqa.selenium.firefox.NotConnectedException: Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms. Firefox console output:
> h changes to installed add-ons
> 1424295898279	addons.xpi-utils	DEBUG	Updating add-on states
> 1424295898281	addons.xpi-utils	DEBUG	Writing add-ons list
> 1424295898291	addons.manager	DEBUG	Registering shutdown blocker for XPIProvider
> 1424295898292	addons.manager	DEBUG	Registering shutdown blocker for LightweightThemeManager
> 1424295898295	addons.manager	DEBUG	Registering shutdown blocker for OpenH264Provider
> 1424295898296	addons.manager	DEBUG	Registering shutdown blocker for PluginProvider
> 1424295898775	DeferredSave.extensions.json	DEBUG	Starting timer
> 1424295898800	DeferredSave.extensions.json	DEBUG	Starting write
> 1424295898858	addons.manager	DEBUG	shutdown
> 1424295898859	addons.manager	DEBUG	Calling shutdown blocker for XPIProvider
> 1424295898859	addons.xpi	DEBUG	shutdown
> 1424295898860	addons.xpi-utils	DEBUG	shutdown
> 1424295898861	addons.manager	DEBUG	Calling shutdown blocker for LightweightThemeManager
> 1424295898862	addons.manager	DEBUG	Calling shutdown blocker for OpenH264Provider
> 1424295898864	addons.manager	DEBUG	Calling shutdown blocker for PluginProvider
> 1424295899016	DeferredSave.extensions.json	DEBUG	Write succeeded
> 1424295899016	addons.xpi-utils	DEBUG	XPI Database saved, setting schema version preference to 16
> 1424295899017	addons.xpi	DEBUG	Notifying XPI shutdown observers
> 1424295899025	addons.manager	DEBUG	Async provider shutdown done
> 1424295900455	addons.manager	DEBUG	Loaded provider scope for resource://gre/modules/addons/XPIProvider.jsm: <resource://gre/modules/addons/XPIProvider.jsm:> ["XPIProvider"]
> 1424295900459	addons.manager	DEBUG	Loaded provider scope for resource://gre/modules/LightweightThemeManager.jsm: <resource://gre/modules/LightweightThemeManager.jsm:> ["LightweightThemeManager"]
> 1424295900468	addons.xpi	DEBUG	startup
> 1424295900470	addons.xpi	INFO	Mapping fxdriver@googlecode.com <ma...@googlecode.com> to /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com <ma...@googlecode.com>
> 1424295900471	addons.xpi	DEBUG	Ignoring file entry whose name is not a valid add-on ID: /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging
> 1424295900472	addons.xpi	INFO	Mapping {972ce4c6-7e08-4474-a285-3208198ce6fd} to /Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900473	addons.xpi	DEBUG	Skipping unavailable install location app-system-share
> 1424295900475	addons.xpi	DEBUG	checkForChanges
> 1424295900476	addons.xpi	DEBUG	Loaded add-on state from prefs: {"app-profile":{"fxdriver@googlecode.com <ma...@googlecode.com>":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com <ma...@googlecode.com>","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900480	addons.xpi	DEBUG	getModTime: Recursive scan of {972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900483	addons.xpi	DEBUG	getInstallState changed: false, state: {"app-profile":{"fxdriver@googlecode.com <ma...@googlecode.com>":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com <ma...@googlecode.com>","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900488	addons.xpi	DEBUG	No changes found
> 1424295900502	addons.manager	DEBUG	Registering shutdown blocker for XPIProvider
> 1424295900504	addons.manager	DEBUG	Registering shutdown blocker for LightweightThemeManager
> 1424295900507	addons.manager	DEBUG	Registering shutdown blocker for OpenH264Provider
> 1424295900508	addons.manager	DEBUG	Registering shutdown blocker for PluginProvider
> *** Blocklist::_preloadBlocklistFile: blocklist is disabled
> 1424295903113	addons.manager	DEBUG	Registering shutdown blocker for <unnamed-provider>
> 
> 	at org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118)
> 	at org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246)
> 	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114)
> 	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191)
> 	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186)
> 	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182)
> 	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95)
> 	at org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53)
> 	at org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199)
> 	at org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161)
> 	at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
> 	at org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101)
> 	at org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151)
> 	at org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492)
> 	at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722)
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
> 
>> On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <jiaxinye@usc.edu <ma...@usc.edu>> wrote:
>> 
>> Hi,
>> 
>> When you install the patch, did you see any fails? No fail is tolerated. I am guessing there is something wrong with ivy.xml. I am suggesting that checkout ALL files in Nutch and then try it again. 
>> 
>> Best,
>> Jiaxin
>> 
>> On Tuesday, February 17, 2015, Jaydeep Bagrecha <bagrecha@usc.edu <ma...@usc.edu>> wrote:
>> Hi all,
>> 	I am trying to install and build selenium with nutch1.10 on Mac Yosemite.
>> 
>>  having following error after downloading selenium patch(https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>) and while using “ant runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?
>> 
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.By <http://org.openqa.selenium.by/>;
>>     [javac]                           ^
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.WebDriver;
>>     [javac]                           ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxDriver;
>>     [javac]                                   ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxProfile;
>> error: cannot find symbol
>>     [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new ThreadLocal<WebDriver>() {
>>     [javac]                             ^
>>     [javac]   symbol:   class WebDriver
>>     [javac]   location: class HttpWebClient
>>  error: cannot find symbol
>>     [javac]     protected WebDriver initialValue()
>>     [javac]               ^
>>     [javac]   symbol: class WebDriver
>>  error: cannot find symbol
>>     [javac]       FirefoxProfile profile = new FirefoxProfile();
>>     [javac]       ^
>>     [javac]   symbol: class FirefoxProfile
>> error: cannot find symbol
>>     [javac]       WebDriver driver = new FirefoxDriver(profile);
>>     [javac]                              ^
>>     [javac]   symbol: class FirefoxDriver
>>  error: cannot find symbol
>>     [javac]       driver = new FirefoxDriver();
>>     [javac]                    ^
>>     [javac]   symbol:   class FirefoxDriver
>>     [javac]   location: class HttpWebClient
>> 
>>  error: cannot find symbol
>>     [javac]       new WebDriverWait(driver, 3);
>>     [javac]           ^
>>     [javac]   symbol:   class WebDriverWait
>>     [javac]   location: class HttpWebClient
>> 
>>  error: cannot find symbol
>>     [javac]       String innerHtml = driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>>     [javac]                                             ^
>>     [javac]   symbol:   variable By
>>     [javac]   location: class HttpWebClient
>> 
>> Thanks,
>> Jaydeep
>> 
>>> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <jiaxinye@usc.edu <javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu');>> wrote:
>>> 
>>> Sure. I will do it once I confirm it works...
>>> 
>>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <chris.a.mattmann@jpl.nasa.gov <javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>>> wiki that has this information?
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov <>
>>> WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Jiaxin Ye <jiaxinye@usc.edu <>>
>>> Reply-To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> Date: Thursday, February 12, 2015 at 9:39 PM
>>> To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> Subject: Nutch-Selenium in Nutch 1.10
>>> 
>>> >Hi Li, Shuo. You are so right. I finished installing and successfully run
>>> >the butch with selenium and Firefox. I have a question though, does your
>>> >Firefox plug out for always all the urls we crawled?
>>> >
>>> >
>>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>>> >OS higher than 10.6 I think...
>>> >
>>> >
>>> >1. Download XQuatz, it's a dmp file, install it directly
>>> >2. Download Nutch 1.10
>>> >3. Download the patch and put it on the Nutch project directory
>>> >4. patch -p0 < THE PATCH NAME
>>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>>> >in the github told you. The patch basically updated those .xml file for
>>> >us. And the patch also installs lib-selenium and protocol selenium for us
>>> >(Correct me if
>>> > I am wrong)
>>> >6. Update tika dependency if needed
>>> >7. Go to the Nutch project directory and run ant runtime
>>> >8. Download Firefox
>>> >9. Open a new terminal and type
>>> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>>> >want...)
>>> >    There should be some errors after entering the command (for me at
>>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>>> >mode to 1777. Rerun the command. xvfb should be working.
>>> >10. Go to nutch > runtime > local and run the crawling command
>>> >
>>> >
>>> >Hope it helps. :)
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>>> ><sli491@usc.edu <> <javascript:_e(%7B%7D,'cvml','sli491@usc.edu <>');>> wrote:
>>> >
>>> >I think I have possibly finished installing.
>>> >
>>> >
>>> >What you need to do:
>>> >0. git status and checkout what you have modified.
>>> >1. patch -p0 < YOUR_PATCH_FILE
>>> >2. ant clean jar
>>> >3. ant runtime
>>> >
>>> >
>>> >Will try crawling using selenium later on. Hope this helped. >_<
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>>> ><chris.a.mattmann@jpl.nasa.gov <>
>>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov <>');>> wrote:
>>> >
>>> >Yes I believe you need to install X11 - why don't you try and report back
>>> >what you find thanks.
>>> >
>>> >Sent from my iPhone
>>> >
>>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu <>
>>> ><javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu <>');>> wrote:
>>> >
>>> >
>>> >
>>> >Hi professor, but can we use Selenium on Mac?
>>> >
>>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>>> ><chris.a.mattmann@jpl.nasa.gov <>
>>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov <>');>> wrote:
>>> >
>>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>>> >
>>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>>> >are here:
>>> >
>>> >https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>
>>> >
>>> >
>>> >Cheers,
>>> >Chris
>>> >
>>> >
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Chris Mattmann, Ph.D.
>>> >Chief Architect
>>> >Instrument Software and Science Data Systems Section (398)
>>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >Office: 168-519, Mailstop: 168-527
>>> >Email: chris.a.mattmann@nasa.gov <>
>>> >WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Adjunct Associate Professor, Computer Science Department
>>> >University of Southern California, Los Angeles, CA 90089 USA
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >-----Original Message-----
>>> >From: Jiaxin Ye <jiaxinye@usc.edu <>>
>>> >Reply-To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> >Date: Thursday, February 12, 2015 at 12:46 AM
>>> >To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>>> >
>>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>>> >>do we need Selenium anyway? Just easier to perform crawling?
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>>> >><sli491@usc.edu <>> wrote:
>>> >>
>>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>> >>be installed properly. The issue would be I don't know how to integrate
>>> >>Selenium with Nutch 1.10.
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>>> >><jiaxinye@usc.edu <>> wrote:
>>> >>
>>> >>Hi all,
>>> >>
>>> >>
>>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>>> >>I find it difficult to install Xvfb on mac.
>>> >>
>>> >>
>>> >>Best,
>>> >>Jiaxin
>>> >>
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>>> >><sapnashs@usc.edu <>> wrote:
>>> >>
>>> >>Hi Shuo Li,
>>> >>
>>> >>
>>> >>We were facing a similar issue. Prof. Mattman suggested we look into this
>>> >>patch for Selenium on Nutch 1.10 :
>>> >>https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>.
>>> >>
>>> >>
>>> >>Hope this helps!
>>> >>
>>> >>
>>> >>Thanks,
>>> >>Sapna
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>>> >><sli491@usc.edu <>> wrote:
>>> >>
>>> >>Yop,
>>> >>
>>> >>
>>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> >>out:
>>> >>
>>> >>
>>> >>error: package org.apache.nutch.storage does not exist
>>> >>
>>> >>
>>> >>
>>> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>>> >>in 1.10?
>>> >>
>>> >>
>>> >>Any advice would be appreciated.
>>> >>
>>> >>
>>> >>Regards,
>>> >>Shuo Li
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>--
>>> >>Graduate Student
>>> >>MS in CS (Data Science)
>>> >>Viterbi School of Engineering
>>> >>University of Southern California
>>> >>
>>> >>
>>> >>Phone:
>>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> 
>> 
> 


Re: Nutch-Selenium in Nutch 1.10

Posted by Jaydeep Bagrecha <ba...@usc.edu>.
thanks Jiaxin!

I again repeated the entire installation procedure and I think i have installed it correctly.(it said BUILD SUCCESSFUL after ant runtime command and has selenium jar files in runtime/local/lib folder)

When i started crawling the mozilla browser popped 2 times,but when i saw crawl statistics,it had fetched no urls(Did anyone have this problem?)

I had following error while crawling:-

org.openqa.selenium.firefox.NotConnectedException: Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms. Firefox console output:
h changes to installed add-ons
1424295898279	addons.xpi-utils	DEBUG	Updating add-on states
1424295898281	addons.xpi-utils	DEBUG	Writing add-ons list
1424295898291	addons.manager	DEBUG	Registering shutdown blocker for XPIProvider
1424295898292	addons.manager	DEBUG	Registering shutdown blocker for LightweightThemeManager
1424295898295	addons.manager	DEBUG	Registering shutdown blocker for OpenH264Provider
1424295898296	addons.manager	DEBUG	Registering shutdown blocker for PluginProvider
1424295898775	DeferredSave.extensions.json	DEBUG	Starting timer
1424295898800	DeferredSave.extensions.json	DEBUG	Starting write
1424295898858	addons.manager	DEBUG	shutdown
1424295898859	addons.manager	DEBUG	Calling shutdown blocker for XPIProvider
1424295898859	addons.xpi	DEBUG	shutdown
1424295898860	addons.xpi-utils	DEBUG	shutdown
1424295898861	addons.manager	DEBUG	Calling shutdown blocker for LightweightThemeManager
1424295898862	addons.manager	DEBUG	Calling shutdown blocker for OpenH264Provider
1424295898864	addons.manager	DEBUG	Calling shutdown blocker for PluginProvider
1424295899016	DeferredSave.extensions.json	DEBUG	Write succeeded
1424295899016	addons.xpi-utils	DEBUG	XPI Database saved, setting schema version preference to 16
1424295899017	addons.xpi	DEBUG	Notifying XPI shutdown observers
1424295899025	addons.manager	DEBUG	Async provider shutdown done
1424295900455	addons.manager	DEBUG	Loaded provider scope for resource://gre/modules/addons/XPIProvider.jsm: ["XPIProvider"]
1424295900459	addons.manager	DEBUG	Loaded provider scope for resource://gre/modules/LightweightThemeManager.jsm: ["LightweightThemeManager"]
1424295900468	addons.xpi	DEBUG	startup
1424295900470	addons.xpi	INFO	Mapping fxdriver@googlecode.com to /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com
1424295900471	addons.xpi	DEBUG	Ignoring file entry whose name is not a valid add-on ID: /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging
1424295900472	addons.xpi	INFO	Mapping {972ce4c6-7e08-4474-a285-3208198ce6fd} to /Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}
1424295900473	addons.xpi	DEBUG	Skipping unavailable install location app-system-share
1424295900475	addons.xpi	DEBUG	checkForChanges
1424295900476	addons.xpi	DEBUG	Loaded add-on state from prefs: {"app-profile":{"fxdriver@googlecode.com":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
1424295900480	addons.xpi	DEBUG	getModTime: Recursive scan of {972ce4c6-7e08-4474-a285-3208198ce6fd}
1424295900483	addons.xpi	DEBUG	getInstallState changed: false, state: {"app-profile":{"fxdriver@googlecode.com":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdriver@googlecode.com","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
1424295900488	addons.xpi	DEBUG	No changes found
1424295900502	addons.manager	DEBUG	Registering shutdown blocker for XPIProvider
1424295900504	addons.manager	DEBUG	Registering shutdown blocker for LightweightThemeManager
1424295900507	addons.manager	DEBUG	Registering shutdown blocker for OpenH264Provider
1424295900508	addons.manager	DEBUG	Registering shutdown blocker for PluginProvider
*** Blocklist::_preloadBlocklistFile: blocklist is disabled
1424295903113	addons.manager	DEBUG	Registering shutdown blocker for <unnamed-provider>

	at org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118)
	at org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246)
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95)
	at org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53)
	at org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199)
	at org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161)
	at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
	at org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101)
	at org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151)
	at org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492)
	at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722)
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1

> On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <ji...@usc.edu> wrote:
> 
> Hi,
> 
> When you install the patch, did you see any fails? No fail is tolerated. I am guessing there is something wrong with ivy.xml. I am suggesting that checkout ALL files in Nutch and then try it again. 
> 
> Best,
> Jiaxin
> 
> On Tuesday, February 17, 2015, Jaydeep Bagrecha <bagrecha@usc.edu <ma...@usc.edu>> wrote:
> Hi all,
> 	I am trying to install and build selenium with nutch1.10 on Mac Yosemite.
> 
>  having following error after downloading selenium patch(https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>) and while using “ant runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?
> 
>  error: package org.openqa.selenium does not exist
>     [javac] import org.openqa.selenium.By <http://org.openqa.selenium.by/>;
>     [javac]                           ^
>  error: package org.openqa.selenium does not exist
>     [javac] import org.openqa.selenium.WebDriver;
>     [javac]                           ^
>  error: package org.openqa.selenium.firefox does not exist
>     [javac] import org.openqa.selenium.firefox.FirefoxDriver;
>     [javac]                                   ^
>  error: package org.openqa.selenium.firefox does not exist
>     [javac] import org.openqa.selenium.firefox.FirefoxProfile;
> error: cannot find symbol
>     [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new ThreadLocal<WebDriver>() {
>     [javac]                             ^
>     [javac]   symbol:   class WebDriver
>     [javac]   location: class HttpWebClient
>  error: cannot find symbol
>     [javac]     protected WebDriver initialValue()
>     [javac]               ^
>     [javac]   symbol: class WebDriver
>  error: cannot find symbol
>     [javac]       FirefoxProfile profile = new FirefoxProfile();
>     [javac]       ^
>     [javac]   symbol: class FirefoxProfile
> error: cannot find symbol
>     [javac]       WebDriver driver = new FirefoxDriver(profile);
>     [javac]                              ^
>     [javac]   symbol: class FirefoxDriver
>  error: cannot find symbol
>     [javac]       driver = new FirefoxDriver();
>     [javac]                    ^
>     [javac]   symbol:   class FirefoxDriver
>     [javac]   location: class HttpWebClient
> 
>  error: cannot find symbol
>     [javac]       new WebDriverWait(driver, 3);
>     [javac]           ^
>     [javac]   symbol:   class WebDriverWait
>     [javac]   location: class HttpWebClient
> 
>  error: cannot find symbol
>     [javac]       String innerHtml = driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>     [javac]                                             ^
>     [javac]   symbol:   variable By
>     [javac]   location: class HttpWebClient
> 
> Thanks,
> Jaydeep
> 
>> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <jiaxinye@usc.edu <javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu');>> wrote:
>> 
>> Sure. I will do it once I confirm it works...
>> 
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <chris.a.mattmann@jpl.nasa.gov <javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>> wiki that has this information?
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov <>
>> WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Jiaxin Ye <jiaxinye@usc.edu <>>
>> Reply-To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>> Date: Thursday, February 12, 2015 at 9:39 PM
>> To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>> Subject: Nutch-Selenium in Nutch 1.10
>> 
>> >Hi Li, Shuo. You are so right. I finished installing and successfully run
>> >the butch with selenium and Firefox. I have a question though, does your
>> >Firefox plug out for always all the urls we crawled?
>> >
>> >
>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>> >OS higher than 10.6 I think...
>> >
>> >
>> >1. Download XQuatz, it's a dmp file, install it directly
>> >2. Download Nutch 1.10
>> >3. Download the patch and put it on the Nutch project directory
>> >4. patch -p0 < THE PATCH NAME
>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>> >in the github told you. The patch basically updated those .xml file for
>> >us. And the patch also installs lib-selenium and protocol selenium for us
>> >(Correct me if
>> > I am wrong)
>> >6. Update tika dependency if needed
>> >7. Go to the Nutch project directory and run ant runtime
>> >8. Download Firefox
>> >9. Open a new terminal and type
>> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>> >want...)
>> >    There should be some errors after entering the command (for me at
>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>> >mode to 1777. Rerun the command. xvfb should be working.
>> >10. Go to nutch > runtime > local and run the crawling command
>> >
>> >
>> >Hope it helps. :)
>> >
>> >
>> >Best,
>> >Jiaxin
>> >
>> >
>> >
>> >
>> >
>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>> ><sli491@usc.edu <> <javascript:_e(%7B%7D,'cvml','sli491@usc.edu <>');>> wrote:
>> >
>> >I think I have possibly finished installing.
>> >
>> >
>> >What you need to do:
>> >0. git status and checkout what you have modified.
>> >1. patch -p0 < YOUR_PATCH_FILE
>> >2. ant clean jar
>> >3. ant runtime
>> >
>> >
>> >Will try crawling using selenium later on. Hope this helped. >_<
>> >
>> >
>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>> ><chris.a.mattmann@jpl.nasa.gov <>
>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov <>');>> wrote:
>> >
>> >Yes I believe you need to install X11 - why don't you try and report back
>> >what you find thanks.
>> >
>> >Sent from my iPhone
>> >
>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu <>
>> ><javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu <>');>> wrote:
>> >
>> >
>> >
>> >Hi professor, but can we use Selenium on Mac?
>> >
>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>> ><chris.a.mattmann@jpl.nasa.gov <>
>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov <>');>> wrote:
>> >
>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>> >
>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>> >are here:
>> >
>> >https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>
>> >
>> >
>> >Cheers,
>> >Chris
>> >
>> >
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Chris Mattmann, Ph.D.
>> >Chief Architect
>> >Instrument Software and Science Data Systems Section (398)
>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >Office: 168-519, Mailstop: 168-527
>> >Email: chris.a.mattmann@nasa.gov <>
>> >WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Adjunct Associate Professor, Computer Science Department
>> >University of Southern California, Los Angeles, CA 90089 USA
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> >-----Original Message-----
>> >From: Jiaxin Ye <jiaxinye@usc.edu <>>
>> >Reply-To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>> >Date: Thursday, February 12, 2015 at 12:46 AM
>> >To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>> >
>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>> >>do we need Selenium anyway? Just easier to perform crawling?
>> >>
>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> >><sli491@usc.edu <>> wrote:
>> >>
>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>> >>be installed properly. The issue would be I don't know how to integrate
>> >>Selenium with Nutch 1.10.
>> >>
>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> >><jiaxinye@usc.edu <>> wrote:
>> >>
>> >>Hi all,
>> >>
>> >>
>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>> >>I find it difficult to install Xvfb on mac.
>> >>
>> >>
>> >>Best,
>> >>Jiaxin
>> >>
>> >>
>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>> >><sapnashs@usc.edu <>> wrote:
>> >>
>> >>Hi Shuo Li,
>> >>
>> >>
>> >>We were facing a similar issue. Prof. Mattman suggested we look into this
>> >>patch for Selenium on Nutch 1.10 :
>> >>https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>.
>> >>
>> >>
>> >>Hope this helps!
>> >>
>> >>
>> >>Thanks,
>> >>Sapna
>> >>
>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>> >><sli491@usc.edu <>> wrote:
>> >>
>> >>Yop,
>> >>
>> >>
>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>> >>out:
>> >>
>> >>
>> >>error: package org.apache.nutch.storage does not exist
>> >>
>> >>
>> >>
>> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>> >>in 1.10?
>> >>
>> >>
>> >>Any advice would be appreciated.
>> >>
>> >>
>> >>Regards,
>> >>Shuo Li
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>--
>> >>Graduate Student
>> >>MS in CS (Data Science)
>> >>Viterbi School of Engineering
>> >>University of Southern California
>> >>
>> >>
>> >>Phone:
>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> 
> 


Re: Nutch-Selenium in Nutch 1.10

Posted by Jiaxin Ye <ji...@usc.edu>.
Hi,

When you install the patch, did you see any fails? No fail is tolerated. I
am guessing there is something wrong with ivy.xml. I am suggesting
that checkout ALL
files in Nutch and then try it again.

Best,
Jiaxin

On Tuesday, February 17, 2015, Jaydeep Bagrecha <ba...@usc.edu> wrote:

> Hi all,
> I am trying to install and build selenium with nutch1.10 on Mac Yosemite.
>
>  having following error after downloading selenium patch(
> https://issues.apache.org/jira/browse/NUTCH-1933) and while using “ant
> runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?
>
>  error: package org.openqa.selenium does not exist
>     [javac] import org.openqa.selenium.By;
>     [javac]                           ^
>  error: package org.openqa.selenium does not exist
>     [javac] import org.openqa.selenium.WebDriver;
>     [javac]                           ^
>  error: package org.openqa.selenium.firefox does not exist
>     [javac] import org.openqa.selenium.firefox.FirefoxDriver;
>     [javac]                                   ^
>  error: package org.openqa.selenium.firefox does not exist
>     [javac] import org.openqa.selenium.firefox.FirefoxProfile;
> error: cannot find symbol
>     [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new
> ThreadLocal<WebDriver>() {
>     [javac]                             ^
>     [javac]   symbol:   class WebDriver
>     [javac]   location: class HttpWebClient
>  error: cannot find symbol
>     [javac]     protected WebDriver initialValue()
>     [javac]               ^
>     [javac]   symbol: class WebDriver
>  error: cannot find symbol
>     [javac]       FirefoxProfile profile = new FirefoxProfile();
>     [javac]       ^
>     [javac]   symbol: class FirefoxProfile
> error: cannot find symbol
>     [javac]       WebDriver driver = new FirefoxDriver(profile);
>     [javac]                              ^
>     [javac]   symbol: class FirefoxDriver
>  error: cannot find symbol
>     [javac]       driver = new FirefoxDriver();
>     [javac]                    ^
>     [javac]   symbol:   class FirefoxDriver
>     [javac]   location: class HttpWebClient
>
>  error: cannot find symbol
>     [javac]       new WebDriverWait(driver, 3);
>     [javac]           ^
>     [javac]   symbol:   class WebDriverWait
>     [javac]   location: class HttpWebClient
>
>  error: cannot find symbol
>     [javac]       String innerHtml =
> driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>     [javac]                                             ^
>     [javac]   symbol:   variable By
>     [javac]   location: class HttpWebClient
>
> Thanks,
> Jaydeep
>
> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <jiaxinye@usc.edu
> <javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu');>> wrote:
>
> Sure. I will do it once I confirm it works...
>
> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov
> <javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>
>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>> wiki that has this information?
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Jiaxin Ye <ji...@usc.edu>
>> Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>> Date: Thursday, February 12, 2015 at 9:39 PM
>> To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>> Subject: Nutch-Selenium in Nutch 1.10
>>
>> >Hi Li, Shuo. You are so right. I finished installing and successfully run
>> >the butch with selenium and Firefox. I have a question though, does your
>> >Firefox plug out for always all the urls we crawled?
>> >
>> >
>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>> >OS higher than 10.6 I think...
>> >
>> >
>> >1. Download XQuatz, it's a dmp file, install it directly
>> >2. Download Nutch 1.10
>> >3. Download the patch and put it on the Nutch project directory
>> >4. patch -p0 < THE PATCH NAME
>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>> >in the github told you. The patch basically updated those .xml file for
>> >us. And the patch also installs lib-selenium and protocol selenium for us
>> >(Correct me if
>> > I am wrong)
>> >6. Update tika dependency if needed
>> >7. Go to the Nutch project directory and run ant runtime
>> >8. Download Firefox
>> >9. Open a new terminal and type
>> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>> >want...)
>> >    There should be some errors after entering the command (for me at
>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>> >mode to 1777. Rerun the command. xvfb should be working.
>> >10. Go to nutch > runtime > local and run the crawling command
>> >
>> >
>> >Hope it helps. :)
>> >
>> >
>> >Best,
>> >Jiaxin
>> >
>> >
>> >
>> >
>> >
>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>> ><sli491@usc.edu <javascript:_e(%7B%7D,'cvml','sli491@usc.edu');>> wrote:
>> >
>> >I think I have possibly finished installing.
>> >
>> >
>> >What you need to do:
>> >0. git status and checkout what you have modified.
>> >1. patch -p0 < YOUR_PATCH_FILE
>> >2. ant clean jar
>> >3. ant runtime
>> >
>> >
>> >Will try crawling using selenium later on. Hope this helped. >_<
>> >
>> >
>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>> ><chris.a.mattmann@jpl.nasa.gov
>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>> >
>> >Yes I believe you need to install X11 - why don't you try and report back
>> >what you find thanks.
>> >
>> >Sent from my iPhone
>> >
>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu
>> ><javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu');>> wrote:
>> >
>> >
>> >
>> >Hi professor, but can we use Selenium on Mac?
>> >
>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>> ><chris.a.mattmann@jpl.nasa.gov
>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>> >
>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>> >
>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>> >are here:
>> >
>> >https://issues.apache.org/jira/browse/NUTCH-1933
>> >
>> >
>> >Cheers,
>> >Chris
>> >
>> >
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Chris Mattmann, Ph.D.
>> >Chief Architect
>> >Instrument Software and Science Data Systems Section (398)
>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >Office: 168-519, Mailstop: 168-527
>> >Email: chris.a.mattmann@nasa.gov
>> >WWW:  http://sunset.usc.edu/~mattmann/
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Adjunct Associate Professor, Computer Science Department
>> >University of Southern California, Los Angeles, CA 90089 USA
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> >-----Original Message-----
>> >From: Jiaxin Ye <ji...@usc.edu>
>> >Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>> >Date: Thursday, February 12, 2015 at 12:46 AM
>> >To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>> >
>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is
>> why
>> >>do we need Selenium anyway? Just easier to perform crawling?
>> >>
>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> >><sl...@usc.edu> wrote:
>> >>
>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>> >>be installed properly. The issue would be I don't know how to integrate
>> >>Selenium with Nutch 1.10.
>> >>
>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> >><ji...@usc.edu> wrote:
>> >>
>> >>Hi all,
>> >>
>> >>
>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac
>> ??
>> >>I find it difficult to install Xvfb on mac.
>> >>
>> >>
>> >>Best,
>> >>Jiaxin
>> >>
>> >>
>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>> >><sa...@usc.edu> wrote:
>> >>
>> >>Hi Shuo Li,
>> >>
>> >>
>> >>We were facing a similar issue. Prof. Mattman suggested we look into
>> this
>> >>patch for Selenium on Nutch 1.10 :
>> >>https://issues.apache.org/jira/browse/NUTCH-1933.
>> >>
>> >>
>> >>Hope this helps!
>> >>
>> >>
>> >>Thanks,
>> >>Sapna
>> >>
>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>> >><sl...@usc.edu> wrote:
>> >>
>> >>Yop,
>> >>
>> >>
>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>> >>out:
>> >>
>> >>
>> >>error: package org.apache.nutch.storage does not exist
>> >>
>> >>
>> >>
>> >>I can only find this package in Nutch 2.x. Is there a way to use
>> Selenium
>> >>in 1.10?
>> >>
>> >>
>> >>Any advice would be appreciated.
>> >>
>> >>
>> >>Regards,
>> >>Shuo Li
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>--
>> >>Graduate Student
>> >>MS in CS (Data Science)
>> >>Viterbi School of Engineering
>> >>University of Southern California
>> >>
>> >>
>> >>Phone:
>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Re: Nutch-Selenium in Nutch 1.10

Posted by Jaydeep Bagrecha <ba...@usc.edu>.
Hi all,
	I am trying to install and build selenium with nutch1.10 on Mac Yosemite.

 having following error after downloading selenium patch(https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>) and while using “ant runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?

 error: package org.openqa.selenium does not exist
    [javac] import org.openqa.selenium.By;
    [javac]                           ^
 error: package org.openqa.selenium does not exist
    [javac] import org.openqa.selenium.WebDriver;
    [javac]                           ^
 error: package org.openqa.selenium.firefox does not exist
    [javac] import org.openqa.selenium.firefox.FirefoxDriver;
    [javac]                                   ^
 error: package org.openqa.selenium.firefox does not exist
    [javac] import org.openqa.selenium.firefox.FirefoxProfile;
error: cannot find symbol
    [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new ThreadLocal<WebDriver>() {
    [javac]                             ^
    [javac]   symbol:   class WebDriver
    [javac]   location: class HttpWebClient
 error: cannot find symbol
    [javac]     protected WebDriver initialValue()
    [javac]               ^
    [javac]   symbol: class WebDriver
 error: cannot find symbol
    [javac]       FirefoxProfile profile = new FirefoxProfile();
    [javac]       ^
    [javac]   symbol: class FirefoxProfile
error: cannot find symbol
    [javac]       WebDriver driver = new FirefoxDriver(profile);
    [javac]                              ^
    [javac]   symbol: class FirefoxDriver
 error: cannot find symbol
    [javac]       driver = new FirefoxDriver();
    [javac]                    ^
    [javac]   symbol:   class FirefoxDriver
    [javac]   location: class HttpWebClient

 error: cannot find symbol
    [javac]       new WebDriverWait(driver, 3);
    [javac]           ^
    [javac]   symbol:   class WebDriverWait
    [javac]   location: class HttpWebClient

 error: cannot find symbol
    [javac]       String innerHtml = driver.findElement(By.tagName("body")).getAttribute("innerHTML");
    [javac]                                             ^
    [javac]   symbol:   variable By
    [javac]   location: class HttpWebClient

Thanks,
Jaydeep

> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <ji...@usc.edu> wrote:
> 
> Sure. I will do it once I confirm it works...
> 
> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <chris.a.mattmann@jpl.nasa.gov <ma...@jpl.nasa.gov>> wrote:
> This is great, Jiaxin, can you please make a wiki page on the Nutch
> wiki that has this information?
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov <javascript:;>
> WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Jiaxin Ye <jiaxinye@usc.edu <javascript:;>>
> Reply-To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org <javascript:;>>
> Date: Thursday, February 12, 2015 at 9:39 PM
> To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org <javascript:;>>
> Subject: Nutch-Selenium in Nutch 1.10
> 
> >Hi Li, Shuo. You are so right. I finished installing and successfully run
> >the butch with selenium and Firefox. I have a question though, does your
> >Firefox plug out for always all the urls we crawled?
> >
> >
> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
> >OS higher than 10.6 I think...
> >
> >
> >1. Download XQuatz, it's a dmp file, install it directly
> >2. Download Nutch 1.10
> >3. Download the patch and put it on the Nutch project directory
> >4. patch -p0 < THE PATCH NAME
> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
> >in the github told you. The patch basically updated those .xml file for
> >us. And the patch also installs lib-selenium and protocol selenium for us
> >(Correct me if
> > I am wrong)
> >6. Update tika dependency if needed
> >7. Go to the Nutch project directory and run ant runtime
> >8. Download Firefox
> >9. Open a new terminal and type
> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
> >want...)
> >    There should be some errors after entering the command (for me at
> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
> >mode to 1777. Rerun the command. xvfb should be working.
> >10. Go to nutch > runtime > local and run the crawling command
> >
> >
> >Hope it helps. :)
> >
> >
> >Best,
> >Jiaxin
> >
> >
> >
> >
> >
> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
> ><sli491@usc.edu <javascript:;> <javascript:_e(%7B%7D,'cvml','sli491@usc.edu <javascript:;>');>> wrote:
> >
> >I think I have possibly finished installing.
> >
> >
> >What you need to do:
> >0. git status and checkout what you have modified.
> >1. patch -p0 < YOUR_PATCH_FILE
> >2. ant clean jar
> >3. ant runtime
> >
> >
> >Will try crawling using selenium later on. Hope this helped. >_<
> >
> >
> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
> ><chris.a.mattmann@jpl.nasa.gov <javascript:;>
> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov <javascript:;>');>> wrote:
> >
> >Yes I believe you need to install X11 - why don't you try and report back
> >what you find thanks.
> >
> >Sent from my iPhone
> >
> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu <javascript:;>
> ><javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu <javascript:;>');>> wrote:
> >
> >
> >
> >Hi professor, but can we use Selenium on Mac?
> >
> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
> ><chris.a.mattmann@jpl.nasa.gov <javascript:;>
> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov <javascript:;>');>> wrote:
> >
> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
> >polar dataset you have been assigned in my CSCI 572 search engines class.
> >
> >The instructions for integrating Selenium with Nutch 1.10-trunk
> >are here:
> >
> >https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>
> >
> >
> >Cheers,
> >Chris
> >
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Chris Mattmann, Ph.D.
> >Chief Architect
> >Instrument Software and Science Data Systems Section (398)
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 168-519, Mailstop: 168-527
> >Email: chris.a.mattmann@nasa.gov <javascript:;>
> >WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Adjunct Associate Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> >
> >-----Original Message-----
> >From: Jiaxin Ye <jiaxinye@usc.edu <javascript:;>>
> >Reply-To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org <javascript:;>>
> >Date: Thursday, February 12, 2015 at 12:46 AM
> >To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org <javascript:;>>
> >Subject: Re: Nutch-Selenium in Nutch 1.10
> >
> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
> >>do we need Selenium anyway? Just easier to perform crawling?
> >>
> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
> >><sli491@usc.edu <javascript:;>> wrote:
> >>
> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
> >>be installed properly. The issue would be I don't know how to integrate
> >>Selenium with Nutch 1.10.
> >>
> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
> >><jiaxinye@usc.edu <javascript:;>> wrote:
> >>
> >>Hi all,
> >>
> >>
> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
> >>I find it difficult to install Xvfb on mac.
> >>
> >>
> >>Best,
> >>Jiaxin
> >>
> >>
> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
> >><sapnashs@usc.edu <javascript:;>> wrote:
> >>
> >>Hi Shuo Li,
> >>
> >>
> >>We were facing a similar issue. Prof. Mattman suggested we look into this
> >>patch for Selenium on Nutch 1.10 :
> >>https://issues.apache.org/jira/browse/NUTCH-1933 <https://issues.apache.org/jira/browse/NUTCH-1933>.
> >>
> >>
> >>Hope this helps!
> >>
> >>
> >>Thanks,
> >>Sapna
> >>
> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
> >><sli491@usc.edu <javascript:;>> wrote:
> >>
> >>Yop,
> >>
> >>
> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
> >>out:
> >>
> >>
> >>error: package org.apache.nutch.storage does not exist
> >>
> >>
> >>
> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
> >>in 1.10?
> >>
> >>
> >>Any advice would be appreciated.
> >>
> >>
> >>Regards,
> >>Shuo Li
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>--
> >>Graduate Student
> >>MS in CS (Data Science)
> >>Viterbi School of Engineering
> >>University of Southern California
> >>
> >>
> >>Phone:
> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> 


Re: Nutch-Selenium in Nutch 1.10

Posted by Jiaxin Ye <ji...@usc.edu>.
Sure. I will do it once I confirm it works...

On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> This is great, Jiaxin, can you please make a wiki page on the Nutch
> wiki that has this information?
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov <javascript:;>
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Jiaxin Ye <jiaxinye@usc.edu <javascript:;>>
> Reply-To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org
> <javascript:;>>
> Date: Thursday, February 12, 2015 at 9:39 PM
> To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org
> <javascript:;>>
> Subject: Nutch-Selenium in Nutch 1.10
>
> >Hi Li, Shuo. You are so right. I finished installing and successfully run
> >the butch with selenium and Firefox. I have a question though, does your
> >Firefox plug out for always all the urls we crawled?
> >
> >
> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
> >OS higher than 10.6 I think...
> >
> >
> >1. Download XQuatz, it's a dmp file, install it directly
> >2. Download Nutch 1.10
> >3. Download the patch and put it on the Nutch project directory
> >4. patch -p0 < THE PATCH NAME
> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
> >in the github told you. The patch basically updated those .xml file for
> >us. And the patch also installs lib-selenium and protocol selenium for us
> >(Correct me if
> > I am wrong)
> >6. Update tika dependency if needed
> >7. Go to the Nutch project directory and run ant runtime
> >8. Download Firefox
> >9. Open a new terminal and type
> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
> >want...)
> >    There should be some errors after entering the command (for me at
> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
> >mode to 1777. Rerun the command. xvfb should be working.
> >10. Go to nutch > runtime > local and run the crawling command
> >
> >
> >Hope it helps. :)
> >
> >
> >Best,
> >Jiaxin
> >
> >
> >
> >
> >
> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
> ><sli491@usc.edu <javascript:;> <javascript:_e(%7B%7D,'cvml','
> sli491@usc.edu <javascript:;>');>> wrote:
> >
> >I think I have possibly finished installing.
> >
> >
> >What you need to do:
> >0. git status and checkout what you have modified.
> >1. patch -p0 < YOUR_PATCH_FILE
> >2. ant clean jar
> >3. ant runtime
> >
> >
> >Will try crawling using selenium later on. Hope this helped. >_<
> >
> >
> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
> ><chris.a.mattmann@jpl.nasa.gov <javascript:;>
> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov
> <javascript:;>');>> wrote:
> >
> >Yes I believe you need to install X11 - why don't you try and report back
> >what you find thanks.
> >
> >Sent from my iPhone
> >
> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu <javascript:;>
> ><javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu <javascript:;>');>> wrote:
> >
> >
> >
> >Hi professor, but can we use Selenium on Mac?
> >
> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
> ><chris.a.mattmann@jpl.nasa.gov <javascript:;>
> ><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov
> <javascript:;>');>> wrote:
> >
> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
> >polar dataset you have been assigned in my CSCI 572 search engines class.
> >
> >The instructions for integrating Selenium with Nutch 1.10-trunk
> >are here:
> >
> >https://issues.apache.org/jira/browse/NUTCH-1933
> >
> >
> >Cheers,
> >Chris
> >
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Chris Mattmann, Ph.D.
> >Chief Architect
> >Instrument Software and Science Data Systems Section (398)
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 168-519, Mailstop: 168-527
> >Email: chris.a.mattmann@nasa.gov <javascript:;>
> >WWW:  http://sunset.usc.edu/~mattmann/
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Adjunct Associate Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> >
> >-----Original Message-----
> >From: Jiaxin Ye <jiaxinye@usc.edu <javascript:;>>
> >Reply-To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org
> <javascript:;>>
> >Date: Thursday, February 12, 2015 at 12:46 AM
> >To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org
> <javascript:;>>
> >Subject: Re: Nutch-Selenium in Nutch 1.10
> >
> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
> >>do we need Selenium anyway? Just easier to perform crawling?
> >>
> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
> >><sli491@usc.edu <javascript:;>> wrote:
> >>
> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
> >>be installed properly. The issue would be I don't know how to integrate
> >>Selenium with Nutch 1.10.
> >>
> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
> >><jiaxinye@usc.edu <javascript:;>> wrote:
> >>
> >>Hi all,
> >>
> >>
> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
> >>I find it difficult to install Xvfb on mac.
> >>
> >>
> >>Best,
> >>Jiaxin
> >>
> >>
> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
> >><sapnashs@usc.edu <javascript:;>> wrote:
> >>
> >>Hi Shuo Li,
> >>
> >>
> >>We were facing a similar issue. Prof. Mattman suggested we look into this
> >>patch for Selenium on Nutch 1.10 :
> >>https://issues.apache.org/jira/browse/NUTCH-1933.
> >>
> >>
> >>Hope this helps!
> >>
> >>
> >>Thanks,
> >>Sapna
> >>
> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
> >><sli491@usc.edu <javascript:;>> wrote:
> >>
> >>Yop,
> >>
> >>
> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
> >>out:
> >>
> >>
> >>error: package org.apache.nutch.storage does not exist
> >>
> >>
> >>
> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
> >>in 1.10?
> >>
> >>
> >>Any advice would be appreciated.
> >>
> >>
> >>Regards,
> >>Shuo Li
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>--
> >>Graduate Student
> >>MS in CS (Data Science)
> >>Viterbi School of Engineering
> >>University of Southern California
> >>
> >>
> >>Phone:
> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Nutch-Selenium in Nutch 1.10

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
This is great, Jiaxin, can you please make a wiki page on the Nutch
wiki that has this information?

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Jiaxin Ye <ji...@usc.edu>
Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Date: Thursday, February 12, 2015 at 9:39 PM
To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Subject: Nutch-Selenium in Nutch 1.10

>Hi Li, Shuo. You are so right. I finished installing and successfully run
>the butch with selenium and Firefox. I have a question though, does your
>Firefox plug out for always all the urls we crawled?
>
>
>Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>OS higher than 10.6 I think...
>
>
>1. Download XQuatz, it's a dmp file, install it directly
>2. Download Nutch 1.10
>3. Download the patch and put it on the Nutch project directory
>4. patch -p0 < THE PATCH NAME
>5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>in the github told you. The patch basically updated those .xml file for
>us. And the patch also installs lib-selenium and protocol selenium for us
>(Correct me if
> I am wrong)
>6. Update tika dependency if needed
>7. Go to the Nutch project directory and run ant runtime
>8. Download Firefox
>9. Open a new terminal and type
>    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>want...)
>    There should be some errors after entering the command (for me at
>least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>mode to 1777. Rerun the command. xvfb should be working.
>10. Go to nutch > runtime > local and run the crawling command
>
>
>Hope it helps. :)
>
>
>Best,
>Jiaxin
>
>
>
>
>
>On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
><sli491@usc.edu <javascript:_e(%7B%7D,'cvml','sli491@usc.edu');>> wrote:
>
>I think I have possibly finished installing.
>
>
>What you need to do:
>0. git status and checkout what you have modified.
>1. patch -p0 < YOUR_PATCH_FILE
>2. ant clean jar
>3. ant runtime
>
>
>Will try crawling using selenium later on. Hope this helped. >_<
>
>
>On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
><chris.a.mattmann@jpl.nasa.gov
><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>
>Yes I believe you need to install X11 - why don't you try and report back
>what you find thanks.
>
>Sent from my iPhone
>
>On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu
><javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu');>> wrote:
>
>
>
>Hi professor, but can we use Selenium on Mac?
>
>On Thursday, February 12, 2015, Mattmann, Chris A (3980)
><chris.a.mattmann@jpl.nasa.gov
><javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>
>You need Selenium Jiaxin, in order to crawl dynamic pages in the
>polar dataset you have been assigned in my CSCI 572 search engines class.
>
>The instructions for integrating Selenium with Nutch 1.10-trunk
>are here:
>
>https://issues.apache.org/jira/browse/NUTCH-1933
>
>
>Cheers,
>Chris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Jiaxin Ye <ji...@usc.edu>
>Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>Date: Thursday, February 12, 2015 at 12:46 AM
>To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>Subject: Re: Nutch-Selenium in Nutch 1.10
>
>>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>>do we need Selenium anyway? Just easier to perform crawling?
>>
>>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>><sl...@usc.edu> wrote:
>>
>>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>be installed properly. The issue would be I don't know how to integrate
>>Selenium with Nutch 1.10.
>>
>>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>><ji...@usc.edu> wrote:
>>
>>Hi all,
>>
>>
>>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>>I find it difficult to install Xvfb on mac.
>>
>>
>>Best,
>>Jiaxin
>>
>>
>>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>><sa...@usc.edu> wrote:
>>
>>Hi Shuo Li,
>>
>>
>>We were facing a similar issue. Prof. Mattman suggested we look into this
>>patch for Selenium on Nutch 1.10 :
>>https://issues.apache.org/jira/browse/NUTCH-1933.
>>
>>
>>Hope this helps!
>>
>>
>>Thanks,
>>Sapna
>>
>>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>><sl...@usc.edu> wrote:
>>
>>Yop,
>>
>>
>>I'm trying to install selenium in Nutch 1.10. However, this error pops
>>out:
>>
>>
>>error: package org.apache.nutch.storage does not exist
>>
>>
>>
>>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>>in 1.10?
>>
>>
>>Any advice would be appreciated.
>>
>>
>>Regards,
>>Shuo Li
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>--
>>Graduate Student
>>MS in CS (Data Science)
>>Viterbi School of Engineering
>>University of Southern California
>>
>>
>>Phone:
>>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Nutch-Selenium in Nutch 1.10

Posted by Jiaxin Ye <ji...@usc.edu>.
Hi Li, Shuo. You are so right. I finished installing and successfully run
the butch with selenium and Firefox. I have a question though, does your
Firefox plug out for always all the urls we crawled?

Hi Prof Mattmann. I think here is the way we install selenium on MAC with
OS higher than 10.6 I think...

1. Download XQuatz, it's a dmp file, install it directly
2. Download Nutch 1.10
3. Download the patch and put it on the Nutch project directory
4. patch -p0 < THE PATCH NAME
5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial in
the github told you. The patch basically updated those .xml file for us.
And the patch also installs lib-selenium and protocol selenium for
us (Correct me if I am wrong)
6. Update tika dependency if needed
7. Go to the Nutch project directory and run ant runtime
8. Download Firefox
9. Open a new terminal and type
    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
want...)
    There should be some errors after entering the command (for me at
least). Manually sudo create a /tmp/.X11-unix folder, and then set the mode
to 1777. Rerun the command. xvfb should be working.
10. Go to nutch > runtime > local and run the crawling command

Hope it helps. :)

Best,
Jiaxin



On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li <sli491@usc.edu
<javascript:_e(%7B%7D,'cvml','sli491@usc.edu');>> wrote:

> I think I have possibly finished installing.
>
> What you need to do:
> 0. git status and checkout what you have modified.
> 1. patch -p0 < YOUR_PATCH_FILE
> 2. ant clean jar
> 3. ant runtime
>
> Will try crawling using selenium later on. Hope this helped. >_<
>
> On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov
> <javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>
>>  Yes I believe you need to install X11 - why don't you try and report
>> back what you find thanks.
>>
>> Sent from my iPhone
>>
>> On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxinye@usc.edu
>> <javascript:_e(%7B%7D,'cvml','jiaxinye@usc.edu');>> wrote:
>>
>>  Hi professor, but can we use Selenium on Mac?
>>
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
>> chris.a.mattmann@jpl.nasa.gov
>> <javascript:_e(%7B%7D,'cvml','chris.a.mattmann@jpl.nasa.gov');>> wrote:
>>
>>> You need Selenium Jiaxin, in order to crawl dynamic pages in the
>>> polar dataset you have been assigned in my CSCI 572 search engines class.
>>>
>>> The instructions for integrating Selenium with Nutch 1.10-trunk
>>> are here:
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-1933
>>>
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Jiaxin Ye <ji...@usc.edu>
>>> Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>>> Date: Thursday, February 12, 2015 at 12:46 AM
>>> To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>>> Subject: Re: Nutch-Selenium in Nutch 1.10
>>>
>>> >Well, good choice. I am thinking changing to ubuntu now. The thing is
>>> why
>>> >do we need Selenium anyway? Just easier to perform crawling?
>>> >
>>> >On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>>> ><sl...@usc.edu> wrote:
>>> >
>>> >Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>> >using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>> >be installed properly. The issue would be I don't know how to integrate
>>> >Selenium with Nutch 1.10.
>>> >
>>> >On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>>> ><ji...@usc.edu> wrote:
>>> >
>>> >Hi all,
>>> >
>>> >
>>> >Anyone here knows where to find the setup tutorial for Selenium on Mac
>>> ??
>>> >I find it difficult to install Xvfb on mac.
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>>> ><sa...@usc.edu> wrote:
>>> >
>>> >Hi Shuo Li,
>>> >
>>> >
>>> >We were facing a similar issue. Prof. Mattman suggested we look into
>>> this
>>> >patch for Selenium on Nutch 1.10 :
>>> >https://issues.apache.org/jira/browse/NUTCH-1933.
>>> >
>>> >
>>> >Hope this helps!
>>> >
>>> >
>>> >Thanks,
>>> >Sapna
>>> >
>>> >On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>>> ><sl...@usc.edu> wrote:
>>> >
>>> >Yop,
>>> >
>>> >
>>> >I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> >out:
>>> >
>>> >
>>> >error: package org.apache.nutch.storage does not exist
>>> >
>>> >
>>> >
>>> >I can only find this package in Nutch 2.x. Is there a way to use
>>> Selenium
>>> >in 1.10?
>>> >
>>> >
>>> >Any advice would be appreciated.
>>> >
>>> >
>>> >Regards,
>>> >Shuo Li
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >--
>>> >Graduate Student
>>> >MS in CS (Data Science)
>>> >Viterbi School of Engineering
>>> >University of Southern California
>>> >
>>> >
>>> >Phone:
>>> >+1 650-307-9848 <tel:%2B1%20650-307-9848>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>

Re: Nutch-Selenium in Nutch 1.10

Posted by Shuo Li <sl...@usc.edu>.
I think I have possibly finished installing.

What you need to do:
0. git status and checkout what you have modified.
1. patch -p0 < YOUR_PATCH_FILE
2. ant clean jar
3. ant runtime

Will try crawling using selenium later on. Hope this helped. >_<

On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

>  Yes I believe you need to install X11 - why don't you try and report
> back what you find thanks.
>
> Sent from my iPhone
>
> On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <ji...@usc.edu> wrote:
>
>  Hi professor, but can we use Selenium on Mac?
>
> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> You need Selenium Jiaxin, in order to crawl dynamic pages in the
>> polar dataset you have been assigned in my CSCI 572 search engines class.
>>
>> The instructions for integrating Selenium with Nutch 1.10-trunk
>> are here:
>>
>> https://issues.apache.org/jira/browse/NUTCH-1933
>>
>>
>> Cheers,
>> Chris
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Jiaxin Ye <ji...@usc.edu>
>> Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>> Date: Thursday, February 12, 2015 at 12:46 AM
>> To: "dev@nutch.apache.org" <de...@nutch.apache.org>
>> Subject: Re: Nutch-Selenium in Nutch 1.10
>>
>> >Well, good choice. I am thinking changing to ubuntu now. The thing is why
>> >do we need Selenium anyway? Just easier to perform crawling?
>> >
>> >On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> ><sl...@usc.edu> wrote:
>> >
>> >Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>> >using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>> >be installed properly. The issue would be I don't know how to integrate
>> >Selenium with Nutch 1.10.
>> >
>> >On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> ><ji...@usc.edu> wrote:
>> >
>> >Hi all,
>> >
>> >
>> >Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>> >I find it difficult to install Xvfb on mac.
>> >
>> >
>> >Best,
>> >Jiaxin
>> >
>> >
>> >On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>> ><sa...@usc.edu> wrote:
>> >
>> >Hi Shuo Li,
>> >
>> >
>> >We were facing a similar issue. Prof. Mattman suggested we look into this
>> >patch for Selenium on Nutch 1.10 :
>> >https://issues.apache.org/jira/browse/NUTCH-1933.
>> >
>> >
>> >Hope this helps!
>> >
>> >
>> >Thanks,
>> >Sapna
>> >
>> >On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>> ><sl...@usc.edu> wrote:
>> >
>> >Yop,
>> >
>> >
>> >I'm trying to install selenium in Nutch 1.10. However, this error pops
>> >out:
>> >
>> >
>> >error: package org.apache.nutch.storage does not exist
>> >
>> >
>> >
>> >I can only find this package in Nutch 2.x. Is there a way to use Selenium
>> >in 1.10?
>> >
>> >
>> >Any advice would be appreciated.
>> >
>> >
>> >Regards,
>> >Shuo Li
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >--
>> >Graduate Student
>> >MS in CS (Data Science)
>> >Viterbi School of Engineering
>> >University of Southern California
>> >
>> >
>> >Phone:
>> >+1 650-307-9848 <tel:%2B1%20650-307-9848>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>

Re: Nutch-Selenium in Nutch 1.10

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Yes I believe you need to install X11 - why don't you try and report back what you find thanks.

Sent from my iPhone

On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <ji...@usc.edu>> wrote:

Hi professor, but can we use Selenium on Mac?

On Thursday, February 12, 2015, Mattmann, Chris A (3980) <ch...@jpl.nasa.gov>> wrote:
You need Selenium Jiaxin, in order to crawl dynamic pages in the
polar dataset you have been assigned in my CSCI 572 search engines class.

The instructions for integrating Selenium with Nutch 1.10-trunk
are here:

https://issues.apache.org/jira/browse/NUTCH-1933


Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov<javascript:;>
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Jiaxin Ye <jiaxinye@usc.edu<javascript:;>>
Reply-To: "dev@nutch.apache.org<javascript:;>" <dev@nutch.apache.org<javascript:;>>
Date: Thursday, February 12, 2015 at 12:46 AM
To: "dev@nutch.apache.org<javascript:;>" <dev@nutch.apache.org<javascript:;>>
Subject: Re: Nutch-Selenium in Nutch 1.10

>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>do we need Selenium anyway? Just easier to perform crawling?
>
>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
><sli491@usc.edu<javascript:;>> wrote:
>
>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>be installed properly. The issue would be I don't know how to integrate
>Selenium with Nutch 1.10.
>
>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
><jiaxinye@usc.edu<javascript:;>> wrote:
>
>Hi all,
>
>
>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>I find it difficult to install Xvfb on mac.
>
>
>Best,
>Jiaxin
>
>
>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
><sapnashs@usc.edu<javascript:;>> wrote:
>
>Hi Shuo Li,
>
>
>We were facing a similar issue. Prof. Mattman suggested we look into this
>patch for Selenium on Nutch 1.10 :
>https://issues.apache.org/jira/browse/NUTCH-1933.
>
>
>Hope this helps!
>
>
>Thanks,
>Sapna
>
>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
><sli491@usc.edu<javascript:;>> wrote:
>
>Yop,
>
>
>I'm trying to install selenium in Nutch 1.10. However, this error pops
>out:
>
>
>error: package org.apache.nutch.storage does not exist
>
>
>
>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>in 1.10?
>
>
>Any advice would be appreciated.
>
>
>Regards,
>Shuo Li
>
>
>
>
>
>
>
>
>
>
>--
>Graduate Student
>MS in CS (Data Science)
>Viterbi School of Engineering
>University of Southern California
>
>
>Phone:
>+1 650-307-9848 <tel:%2B1%20650-307-9848>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Nutch-Selenium in Nutch 1.10

Posted by Jiaxin Ye <ji...@usc.edu>.
Hi professor, but can we use Selenium on Mac?

On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> You need Selenium Jiaxin, in order to crawl dynamic pages in the
> polar dataset you have been assigned in my CSCI 572 search engines class.
>
> The instructions for integrating Selenium with Nutch 1.10-trunk
> are here:
>
> https://issues.apache.org/jira/browse/NUTCH-1933
>
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov <javascript:;>
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Jiaxin Ye <jiaxinye@usc.edu <javascript:;>>
> Reply-To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org
> <javascript:;>>
> Date: Thursday, February 12, 2015 at 12:46 AM
> To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org
> <javascript:;>>
> Subject: Re: Nutch-Selenium in Nutch 1.10
>
> >Well, good choice. I am thinking changing to ubuntu now. The thing is why
> >do we need Selenium anyway? Just easier to perform crawling?
> >
> >On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
> ><sli491@usc.edu <javascript:;>> wrote:
> >
> >Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
> >using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
> >be installed properly. The issue would be I don't know how to integrate
> >Selenium with Nutch 1.10.
> >
> >On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
> ><jiaxinye@usc.edu <javascript:;>> wrote:
> >
> >Hi all,
> >
> >
> >Anyone here knows where to find the setup tutorial for Selenium on Mac ??
> >I find it difficult to install Xvfb on mac.
> >
> >
> >Best,
> >Jiaxin
> >
> >
> >On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
> ><sapnashs@usc.edu <javascript:;>> wrote:
> >
> >Hi Shuo Li,
> >
> >
> >We were facing a similar issue. Prof. Mattman suggested we look into this
> >patch for Selenium on Nutch 1.10 :
> >https://issues.apache.org/jira/browse/NUTCH-1933.
> >
> >
> >Hope this helps!
> >
> >
> >Thanks,
> >Sapna
> >
> >On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
> ><sli491@usc.edu <javascript:;>> wrote:
> >
> >Yop,
> >
> >
> >I'm trying to install selenium in Nutch 1.10. However, this error pops
> >out:
> >
> >
> >error: package org.apache.nutch.storage does not exist
> >
> >
> >
> >I can only find this package in Nutch 2.x. Is there a way to use Selenium
> >in 1.10?
> >
> >
> >Any advice would be appreciated.
> >
> >
> >Regards,
> >Shuo Li
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >--
> >Graduate Student
> >MS in CS (Data Science)
> >Viterbi School of Engineering
> >University of Southern California
> >
> >
> >Phone:
> >+1 650-307-9848 <tel:%2B1%20650-307-9848>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Nutch-Selenium in Nutch 1.10

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
You need Selenium Jiaxin, in order to crawl dynamic pages in the
polar dataset you have been assigned in my CSCI 572 search engines class.

The instructions for integrating Selenium with Nutch 1.10-trunk
are here: 

https://issues.apache.org/jira/browse/NUTCH-1933


Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Jiaxin Ye <ji...@usc.edu>
Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Date: Thursday, February 12, 2015 at 12:46 AM
To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Subject: Re: Nutch-Selenium in Nutch 1.10

>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>do we need Selenium anyway? Just easier to perform crawling?
>
>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
><sl...@usc.edu> wrote:
>
>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>be installed properly. The issue would be I don't know how to integrate
>Selenium with Nutch 1.10.
>
>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
><ji...@usc.edu> wrote:
>
>Hi all,
>
>
>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>I find it difficult to install Xvfb on mac.
>
>
>Best,
>Jiaxin
>
>
>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
><sa...@usc.edu> wrote:
>
>Hi Shuo Li,
>
>
>We were facing a similar issue. Prof. Mattman suggested we look into this
>patch for Selenium on Nutch 1.10 :
>https://issues.apache.org/jira/browse/NUTCH-1933.
>
>
>Hope this helps!
>
>
>Thanks,
>Sapna
>
>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
><sl...@usc.edu> wrote:
>
>Yop,
>
>
>I'm trying to install selenium in Nutch 1.10. However, this error pops
>out:
>
>
>error: package org.apache.nutch.storage does not exist
>
>
>
>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>in 1.10? 
>
>
>Any advice would be appreciated.
>
>
>Regards,
>Shuo Li
>
>
>
>
>
>
>
>
>
>
>-- 
>Graduate Student
>MS in CS (Data Science)
>Viterbi School of Engineering
>University of Southern California
>
>
>Phone: 
>+1 650-307-9848 <tel:%2B1%20650-307-9848>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Nutch-Selenium in Nutch 1.10

Posted by Jiaxin Ye <ji...@usc.edu>.
Well, good choice. I am thinking changing to ubuntu now. The thing is why
do we need Selenium anyway? Just easier to perform crawling?

On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li <sl...@usc.edu> wrote:

> Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
> using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still be
> installed properly. The issue would be I don't know how to integrate
> Selenium with Nutch 1.10.
>
> On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye <ji...@usc.edu> wrote:
>
>> Hi all,
>>
>> Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>> I find it difficult to install Xvfb on mac.
>>
>> Best,
>> Jiaxin
>>
>> On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh <sa...@usc.edu>
>> wrote:
>>
>>> Hi Shuo Li,
>>>
>>> We were facing a similar issue. Prof. Mattman suggested we look into
>>> this patch for Selenium on Nutch 1.10 :
>>> https://issues.apache.org/jira/browse/NUTCH-1933.
>>>
>>> Hope this helps!
>>>
>>> Thanks,
>>> Sapna
>>>
>>> On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li <sl...@usc.edu> wrote:
>>>
>>>> Yop,
>>>>
>>>> I'm trying to install selenium in Nutch 1.10. However, this error pops
>>>> out:
>>>>
>>>> *error: package org.apache.nutch.storage does not exist*
>>>>
>>>> I can only find this package in Nutch 2.x. Is there a way to use
>>>> Selenium in 1.10?
>>>>
>>>> Any advice would be appreciated.
>>>>
>>>> Regards,
>>>> Shuo Li
>>>>
>>>
>>>
>>>
>>> --
>>> Graduate Student
>>> MS in CS (Data Science)
>>> Viterbi School of Engineering
>>> University of Southern California
>>>
>>> Phone: +1 650-307-9848
>>>
>>
>>
>

Re: Nutch-Selenium in Nutch 1.10

Posted by Shuo Li <sl...@usc.edu>.
Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still be
installed properly. The issue would be I don't know how to integrate
Selenium with Nutch 1.10.

On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye <ji...@usc.edu> wrote:

> Hi all,
>
> Anyone here knows where to find the setup tutorial for Selenium on Mac ??
> I find it difficult to install Xvfb on mac.
>
> Best,
> Jiaxin
>
> On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh <sa...@usc.edu>
> wrote:
>
>> Hi Shuo Li,
>>
>> We were facing a similar issue. Prof. Mattman suggested we look into this
>> patch for Selenium on Nutch 1.10 :
>> https://issues.apache.org/jira/browse/NUTCH-1933.
>>
>> Hope this helps!
>>
>> Thanks,
>> Sapna
>>
>> On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li <sl...@usc.edu> wrote:
>>
>>> Yop,
>>>
>>> I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> out:
>>>
>>> *error: package org.apache.nutch.storage does not exist*
>>>
>>> I can only find this package in Nutch 2.x. Is there a way to use
>>> Selenium in 1.10?
>>>
>>> Any advice would be appreciated.
>>>
>>> Regards,
>>> Shuo Li
>>>
>>
>>
>>
>> --
>> Graduate Student
>> MS in CS (Data Science)
>> Viterbi School of Engineering
>> University of Southern California
>>
>> Phone: +1 650-307-9848
>>
>
>

Re: Nutch-Selenium in Nutch 1.10

Posted by Jiaxin Ye <ji...@usc.edu>.
Hi all,

Anyone here knows where to find the setup tutorial for Selenium on Mac ?? I
find it difficult to install Xvfb on mac.

Best,
Jiaxin

On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh <sa...@usc.edu> wrote:

> Hi Shuo Li,
>
> We were facing a similar issue. Prof. Mattman suggested we look into this
> patch for Selenium on Nutch 1.10 :
> https://issues.apache.org/jira/browse/NUTCH-1933.
>
> Hope this helps!
>
> Thanks,
> Sapna
>
> On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li <sl...@usc.edu> wrote:
>
>> Yop,
>>
>> I'm trying to install selenium in Nutch 1.10. However, this error pops
>> out:
>>
>> *error: package org.apache.nutch.storage does not exist*
>>
>> I can only find this package in Nutch 2.x. Is there a way to use Selenium
>> in 1.10?
>>
>> Any advice would be appreciated.
>>
>> Regards,
>> Shuo Li
>>
>
>
>
> --
> Graduate Student
> MS in CS (Data Science)
> Viterbi School of Engineering
> University of Southern California
>
> Phone: +1 650-307-9848
>

Re: Nutch-Selenium in Nutch 1.10

Posted by Sapnashri Suresh <sa...@usc.edu>.
Hi Shuo Li,

We were facing a similar issue. Prof. Mattman suggested we look into this
patch for Selenium on Nutch 1.10 :
https://issues.apache.org/jira/browse/NUTCH-1933.

Hope this helps!

Thanks,
Sapna

On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li <sl...@usc.edu> wrote:

> Yop,
>
> I'm trying to install selenium in Nutch 1.10. However, this error pops out:
>
> *error: package org.apache.nutch.storage does not exist*
>
> I can only find this package in Nutch 2.x. Is there a way to use Selenium
> in 1.10?
>
> Any advice would be appreciated.
>
> Regards,
> Shuo Li
>



-- 
Graduate Student
MS in CS (Data Science)
Viterbi School of Engineering
University of Southern California

Phone: +1 650-307-9848