You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Cement Xianyu <ce...@gmail.com> on 2006/04/30 17:57:25 UTC

Startscript in windows

Hi

Because I want to use nutch on my thinkdpad under windows,
I read the original start script and change it in windows' batch script
file.
There are two files.
The first file is used to ensure Delayed environment variable is enabled.

Also files can be found from my blog: http://dwangel.3322.org/2006/04/29/44/
( In chinese :P )

I wish these will be helpful. May it can be included into package.

The followings are the content.

nutch.bat
@cmd /V:on /c %~dp0nutch1.bat %*

nutch1.bat
@echo on
rem *****************************************************************
rem * A script to launch nutch on Windows 2000/XP System.
rem *
rem * Written by Cement Xianyu
rem * ( cement.xianyu@gmail.com blog: http://dwangel.3322.org)
rem *
rem * Because delayed environment is used, cmd /V:on should be used to
rem * run this script.
rem *****************************************************************
if "%OS%"=="Windows_NT" @setlocal
if "%OS%"=="WINNT" @setlocal

if "%1" == "" goto :msg
goto :begin
:msg
echo "Usage: nutch COMMAND"
  echo "where COMMAND is one of:"
  echo "  crawl             one-step crawler for intranets"
  echo "  readdb            read / dump crawl db"
  echo "  readlinkdb        read / dump link db"
  echo "  inject            inject new urls into the database"
  echo "  generate          generate new segments to fetch"
  echo "  fetch             fetch a segment's pages"
  echo "  parse             parse a segment's pages"
  echo "  segread           read / dump segment data"
  echo "  updatedb          update crawl db from segments after fetching"
  echo "  invertlinks       create a linkdb from parsed segments"
  echo "  index             run the indexer on parsed segments and linkdb"
  echo "  merge             merge several segment indexes"
  echo "  dedup             remove duplicates from a set of segment indexes"
  echo "  plugin            load a plugin and run one of its classes main()"
  echo "  server            run a search server"
  echo " or"
  echo "  CLASSNAME         run the class named CLASSNAME"
  echo "Most commands print help when invoked w/o parameters."
pause
goto :end

:begin
rem %~dp0 is expanded pathname of the current script under NT
set DEFAULT_NUTCH_HOME=%~dp0..
rem set DEFAULT_NUTCH_HOME=..

if "%NUTCH_HOME%"=="" set NUTCH_HOME=%DEFAULT_NUTCH_HOME%
set DEFAULT_NUTCH_HOME=""

echo %NUTCH_HOME%

rem set _USE_CLASSPATH=yes

if "%CLASSPATH%"=="" ( set CLASSPATH=%JAVA_HOME%\lib\tools.jar) ELSE set
CLASSPATH=%CLASSPATH%;%JAVA_HOME%\lib\tools.jar
set CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\conf;
echo %CLASSPATH%
echo before other

rem for developers, add plugins, job & test code to CLASSPATH
if exist %NUTCH_HOME%\build\plugins set
CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\build

for /R %NUTCH_HOME%\build %%i in (nutch*.job) do set
CLASSPATH=!CLASSPATH!;%%i
if exist %NUTCH_HOME%\build\test\classes set
CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\build\test\classes

rem for releases, add Nutch job to CLASSPATH
for /R %NUTCH_HOME% %%i in (nutch*.job) do set CLASSPATH=!CLASSPATH!;%%i
rem add plugins to classpath
if exist %NUTCH_HOME%\plugins set CLASSPATH=%CLASSPATH%;%NUTCH_HOME%
rem add libs to CLASSPATH
for /R %NUTCH_HOME%\lib %%f in (*.jar) do set CLASSPATH=!CLASSPATH!;%%f


echo %CLASSPATH%

rem translate command
if "%1"=="crawl" set CLASS=org.apache.nutch.crawl.Crawl
if "%1"=="inject" set   CLASS=org.apache.nutch.crawl.Injector
if "%1"=="generate" set   CLASS=org.apache.nutch.crawl.Generator
if "%1"=="fetch" set   CLASS=org.apache.nutch.fetcher.Fetcher
if "%1"=="parse" set   CLASS=org.apache.nutch.parse.ParseSegment
if "%1"=="readdb" set   CLASS=org.apache.nutch.crawl.CrawlDbReader
if "%1"=="readlinkdb" set   CLASS=org.apache.nutch.crawl.LinkDbReader
if "%1"=="segread" set   CLASS=org.apache.nutch.segment.SegmentReader
if "%1"=="updatedb" set   CLASS=org.apache.nutch.crawl.CrawlDb
if "%1"=="invertlinks" set   CLASS=org.apache.nutch.crawl.LinkDb
if "%1"=="index" set   CLASS=org.apache.nutch.indexer.Indexer
if "%1"=="dedup" set   CLASS=org.apache.nutch.indexer.DeleteDuplicates
if "%1"=="merge" set   CLASS=org.apache.nutch.indexer.IndexMerger
if "%1"=="plugin" set   CLASS=org.apache.nutch.plugin.PluginRepository
if "%1"=="server" set CLASS='
org.apache.nutch.searcher.DistributedSearch$Server'
if "%CLASS%"=="" set CLASS=%1

%JAVA_HOME%\bin\java -cp %CLASSPATH% %CLASS% %*


if "%OS%"=="Windows_NT" @endlocal
if "%OS%"=="WINNT" @endlocal

:end

Re: Startscript in windows

Posted by Nutch Newbie <nu...@gmail.com>.
AJ

Did you update the scrpit to reflect new changes in 0.8? no? I can
update it.. however I am getting a Class not found error when I try to
run nutch crawl or nutch inject?? yes I did pointed it to the current
class in 0.8??? any suggestions

Thanks

On 4/30/06, ArentJan Banck <aj...@planet.nl> wrote:
> I also wrote a Windows batch file, and created a Jira case for this, see
> http://issues.apache.org/jira/browse/NUTCH-82
>
> -Arent-Jan
>
> ----- Original Message -----
> From: "Cement Xianyu" <ce...@gmail.com>
> To: <nu...@lucene.apache.org>
> Sent: Sunday, April 30, 2006 5:57 PM
> Subject: Startscript in windows
>
>
> Hi
>
> Because I want to use nutch on my thinkdpad under windows,
> I read the original start script and change it in windows' batch script
> file.
> There are two files.
> The first file is used to ensure Delayed environment variable is enabled.
>
> Also files can be found from my blog: http://dwangel.3322.org/2006/04/29/44/
> ( In chinese :P )
>
> I wish these will be helpful. May it can be included into package.
>
> The followings are the content.
>
> nutch.bat
> @cmd /V:on /c %~dp0nutch1.bat %*
>
> nutch1.bat
> @echo on
> rem *****************************************************************
> rem * A script to launch nutch on Windows 2000/XP System.
> rem *
> rem * Written by Cement Xianyu
> rem * ( cement.xianyu@gmail.com blog: http://dwangel.3322.org)
> rem *
> rem * Because delayed environment is used, cmd /V:on should be used to
> rem * run this script.
> rem *****************************************************************
> if "%OS%"=="Windows_NT" @setlocal
> if "%OS%"=="WINNT" @setlocal
>
> if "%1" == "" goto :msg
> goto :begin
> :msg
> echo "Usage: nutch COMMAND"
>   echo "where COMMAND is one of:"
>   echo "  crawl             one-step crawler for intranets"
>   echo "  readdb            read / dump crawl db"
>   echo "  readlinkdb        read / dump link db"
>   echo "  inject            inject new urls into the database"
>   echo "  generate          generate new segments to fetch"
>   echo "  fetch             fetch a segment's pages"
>   echo "  parse             parse a segment's pages"
>   echo "  segread           read / dump segment data"
>   echo "  updatedb          update crawl db from segments after fetching"
>   echo "  invertlinks       create a linkdb from parsed segments"
>   echo "  index             run the indexer on parsed segments and linkdb"
>   echo "  merge             merge several segment indexes"
>   echo "  dedup             remove duplicates from a set of segment indexes"
>   echo "  plugin            load a plugin and run one of its classes main()"
>   echo "  server            run a search server"
>   echo " or"
>   echo "  CLASSNAME         run the class named CLASSNAME"
>   echo "Most commands print help when invoked w/o parameters."
> pause
> goto :end
>
> :begin
> rem %~dp0 is expanded pathname of the current script under NT
> set DEFAULT_NUTCH_HOME=%~dp0..
> rem set DEFAULT_NUTCH_HOME=..
>
> if "%NUTCH_HOME%"=="" set NUTCH_HOME=%DEFAULT_NUTCH_HOME%
> set DEFAULT_NUTCH_HOME=""
>
> echo %NUTCH_HOME%
>
> rem set _USE_CLASSPATH=yes
>
> if "%CLASSPATH%"=="" ( set CLASSPATH=%JAVA_HOME%\lib\tools.jar) ELSE set
> CLASSPATH=%CLASSPATH%;%JAVA_HOME%\lib\tools.jar
> set CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\conf;
> echo %CLASSPATH%
> echo before other
>
> rem for developers, add plugins, job & test code to CLASSPATH
> if exist %NUTCH_HOME%\build\plugins set
> CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\build
>
> for /R %NUTCH_HOME%\build %%i in (nutch*.job) do set
> CLASSPATH=!CLASSPATH!;%%i
> if exist %NUTCH_HOME%\build\test\classes set
> CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\build\test\classes
>
> rem for releases, add Nutch job to CLASSPATH
> for /R %NUTCH_HOME% %%i in (nutch*.job) do set CLASSPATH=!CLASSPATH!;%%i
> rem add plugins to classpath
> if exist %NUTCH_HOME%\plugins set CLASSPATH=%CLASSPATH%;%NUTCH_HOME%
> rem add libs to CLASSPATH
> for /R %NUTCH_HOME%\lib %%f in (*.jar) do set CLASSPATH=!CLASSPATH!;%%f
>
>
> echo %CLASSPATH%
>
> rem translate command
> if "%1"=="crawl" set CLASS=org.apache.nutch.crawl.Crawl
> if "%1"=="inject" set   CLASS=org.apache.nutch.crawl.Injector
> if "%1"=="generate" set   CLASS=org.apache.nutch.crawl.Generator
> if "%1"=="fetch" set   CLASS=org.apache.nutch.fetcher.Fetcher
> if "%1"=="parse" set   CLASS=org.apache.nutch.parse.ParseSegment
> if "%1"=="readdb" set   CLASS=org.apache.nutch.crawl.CrawlDbReader
> if "%1"=="readlinkdb" set   CLASS=org.apache.nutch.crawl.LinkDbReader
> if "%1"=="segread" set   CLASS=org.apache.nutch.segment.SegmentReader
> if "%1"=="updatedb" set   CLASS=org.apache.nutch.crawl.CrawlDb
> if "%1"=="invertlinks" set   CLASS=org.apache.nutch.crawl.LinkDb
> if "%1"=="index" set   CLASS=org.apache.nutch.indexer.Indexer
> if "%1"=="dedup" set   CLASS=org.apache.nutch.indexer.DeleteDuplicates
> if "%1"=="merge" set   CLASS=org.apache.nutch.indexer.IndexMerger
> if "%1"=="plugin" set   CLASS=org.apache.nutch.plugin.PluginRepository
> if "%1"=="server" set CLASS='
> org.apache.nutch.searcher.DistributedSearch$Server'
> if "%CLASS%"=="" set CLASS=%1
>
> %JAVA_HOME%\bin\java -cp %CLASSPATH% %CLASS% %*
>
>
> if "%OS%"=="Windows_NT" @endlocal
> if "%OS%"=="WINNT" @endlocal
>
> :end
>
>
>

Re: Startscript in windows

Posted by ArentJan Banck <aj...@planet.nl>.
I also wrote a Windows batch file, and created a Jira case for this, see 
http://issues.apache.org/jira/browse/NUTCH-82

-Arent-Jan

----- Original Message ----- 
From: "Cement Xianyu" <ce...@gmail.com>
To: <nu...@lucene.apache.org>
Sent: Sunday, April 30, 2006 5:57 PM
Subject: Startscript in windows


Hi

Because I want to use nutch on my thinkdpad under windows,
I read the original start script and change it in windows' batch script
file.
There are two files.
The first file is used to ensure Delayed environment variable is enabled.

Also files can be found from my blog: http://dwangel.3322.org/2006/04/29/44/
( In chinese :P )

I wish these will be helpful. May it can be included into package.

The followings are the content.

nutch.bat
@cmd /V:on /c %~dp0nutch1.bat %*

nutch1.bat
@echo on
rem *****************************************************************
rem * A script to launch nutch on Windows 2000/XP System.
rem *
rem * Written by Cement Xianyu
rem * ( cement.xianyu@gmail.com blog: http://dwangel.3322.org)
rem *
rem * Because delayed environment is used, cmd /V:on should be used to
rem * run this script.
rem *****************************************************************
if "%OS%"=="Windows_NT" @setlocal
if "%OS%"=="WINNT" @setlocal

if "%1" == "" goto :msg
goto :begin
:msg
echo "Usage: nutch COMMAND"
  echo "where COMMAND is one of:"
  echo "  crawl             one-step crawler for intranets"
  echo "  readdb            read / dump crawl db"
  echo "  readlinkdb        read / dump link db"
  echo "  inject            inject new urls into the database"
  echo "  generate          generate new segments to fetch"
  echo "  fetch             fetch a segment's pages"
  echo "  parse             parse a segment's pages"
  echo "  segread           read / dump segment data"
  echo "  updatedb          update crawl db from segments after fetching"
  echo "  invertlinks       create a linkdb from parsed segments"
  echo "  index             run the indexer on parsed segments and linkdb"
  echo "  merge             merge several segment indexes"
  echo "  dedup             remove duplicates from a set of segment indexes"
  echo "  plugin            load a plugin and run one of its classes main()"
  echo "  server            run a search server"
  echo " or"
  echo "  CLASSNAME         run the class named CLASSNAME"
  echo "Most commands print help when invoked w/o parameters."
pause
goto :end

:begin
rem %~dp0 is expanded pathname of the current script under NT
set DEFAULT_NUTCH_HOME=%~dp0..
rem set DEFAULT_NUTCH_HOME=..

if "%NUTCH_HOME%"=="" set NUTCH_HOME=%DEFAULT_NUTCH_HOME%
set DEFAULT_NUTCH_HOME=""

echo %NUTCH_HOME%

rem set _USE_CLASSPATH=yes

if "%CLASSPATH%"=="" ( set CLASSPATH=%JAVA_HOME%\lib\tools.jar) ELSE set
CLASSPATH=%CLASSPATH%;%JAVA_HOME%\lib\tools.jar
set CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\conf;
echo %CLASSPATH%
echo before other

rem for developers, add plugins, job & test code to CLASSPATH
if exist %NUTCH_HOME%\build\plugins set
CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\build

for /R %NUTCH_HOME%\build %%i in (nutch*.job) do set
CLASSPATH=!CLASSPATH!;%%i
if exist %NUTCH_HOME%\build\test\classes set
CLASSPATH=%CLASSPATH%;%NUTCH_HOME%\build\test\classes

rem for releases, add Nutch job to CLASSPATH
for /R %NUTCH_HOME% %%i in (nutch*.job) do set CLASSPATH=!CLASSPATH!;%%i
rem add plugins to classpath
if exist %NUTCH_HOME%\plugins set CLASSPATH=%CLASSPATH%;%NUTCH_HOME%
rem add libs to CLASSPATH
for /R %NUTCH_HOME%\lib %%f in (*.jar) do set CLASSPATH=!CLASSPATH!;%%f


echo %CLASSPATH%

rem translate command
if "%1"=="crawl" set CLASS=org.apache.nutch.crawl.Crawl
if "%1"=="inject" set   CLASS=org.apache.nutch.crawl.Injector
if "%1"=="generate" set   CLASS=org.apache.nutch.crawl.Generator
if "%1"=="fetch" set   CLASS=org.apache.nutch.fetcher.Fetcher
if "%1"=="parse" set   CLASS=org.apache.nutch.parse.ParseSegment
if "%1"=="readdb" set   CLASS=org.apache.nutch.crawl.CrawlDbReader
if "%1"=="readlinkdb" set   CLASS=org.apache.nutch.crawl.LinkDbReader
if "%1"=="segread" set   CLASS=org.apache.nutch.segment.SegmentReader
if "%1"=="updatedb" set   CLASS=org.apache.nutch.crawl.CrawlDb
if "%1"=="invertlinks" set   CLASS=org.apache.nutch.crawl.LinkDb
if "%1"=="index" set   CLASS=org.apache.nutch.indexer.Indexer
if "%1"=="dedup" set   CLASS=org.apache.nutch.indexer.DeleteDuplicates
if "%1"=="merge" set   CLASS=org.apache.nutch.indexer.IndexMerger
if "%1"=="plugin" set   CLASS=org.apache.nutch.plugin.PluginRepository
if "%1"=="server" set CLASS='
org.apache.nutch.searcher.DistributedSearch$Server'
if "%CLASS%"=="" set CLASS=%1

%JAVA_HOME%\bin\java -cp %CLASSPATH% %CLASS% %*


if "%OS%"=="Windows_NT" @endlocal
if "%OS%"=="WINNT" @endlocal

:end