You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "AJ Banck (JIRA)" <ji...@apache.org> on 2005/08/15 20:22:54 UTC

[jira] Created: (NUTCH-82) Nutch Commands should run on Windows without external tools

Nutch Commands should run on Windows without external tools
-----------------------------------------------------------

         Key: NUTCH-82
         URL: http://issues.apache.org/jira/browse/NUTCH-82
     Project: Nutch
        Type: New Feature
 Environment: Windows 2000
    Reporter: AJ Banck


Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Fuad Efendi (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332310 ] 

Fuad Efendi commented on NUTCH-82:
----------------------------------

CONF folder should be before nutch.jar

set NUTCH_CLASSPATH=%NUTCH_HOME%;%NUTCH_HOME%/conf;%NUTCH_HOME%/nutch.jar


> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332543 ] 

Doug Cutting commented on NUTCH-82:
-----------------------------------

I do not think we should have multiple versions of the command line tools, since that complicates maintenance.  A windows batch file is not portable, and is thus not a good candidate to replace the bash versions.  I also don't see that requiring perl is any better than requiring cygwin on windows, and I suspect even with Perl we'd probably require cygwin.  So, unless someone objects, I will close this issue.

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Matt Kangas (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332660 ] 

Matt Kangas commented on NUTCH-82:
----------------------------------

Another "pure Java" solution is to rewrite the "nutch" bash script in BeanShell (http://www.beanshell.org).

I just took a quick (~1 hr) stab at this. The syntax seems quite agreeable, with many builtin versions of standard unix commands (cd(), cat(), etc). However, I quickly hit two barriers:

1) Reading environment variables. System.getenv() works on 1.5, but is nonfunctional on Java 1.3 and The only workaround on 1.4 is what ant does: run a native command, read the
output, and set system properties.

2) Setting -Xmx et al. My sense is that it's simply not possible.

Other than these issues, it would be quite easy to rewrite all of the usage/command/path-building logic into a beanshell script. Then there could be two *small* scripts (bash & .bat) to handle the stuff that can't be done in Java, and one beanshell script for the rest. Does that seem useful? 

FYI, the core beanshell interpreter is ~143k.

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Nick Jacobsen (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332548 ] 

Nick Jacobsen commented on NUTCH-82:
------------------------------------

No offense, but you must not use windows.  While perl does not come with windows (just as it does not really come with many linux distributions), it is easily available for every system I have ever used - and available in binary form.  For instance, in windows, there is the free perl implimentation from ActiveWorks (ActivePerl).  Really, if you want full cross platform support, why is our command line tool not Java, considering that is a requirement to run the software anyway?  As for the whole "just install cygwin" thing, cygwin requires specific directory structures, and is not a quick, nor easy, download.

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332549 ] 

Doug Cutting commented on NUTCH-82:
-----------------------------------

I do in fact sometimes develop Nutch on windows.

I would be happy if someone supplied a Java replacement for the command line tools.  That would indeed remove a dependency.  But I still don't see how requiring Perl simplifies things.  I'm also not much of a Perl programmer.  Are you willing to maintain translations of all of the scripts in nutch's bin directory as Perl?

Note that the mapred branch has more scripts.  Also note that the mapred branch relies on the 'df' program to portably access the amount of free space on a volume.  Is there a portable Perl alternative?

http://svn.apache.org/viewcvs.cgi/lucene/nutch/branches/mapred/bin/?rev=326780


> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "AJ Banck (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-82?page=all ]

AJ Banck updated NUTCH-82:
--------------------------

    Attachment: nutch.bat

nutch.bat to be placed in bin folder.
This allows running all Nutch commandline tools from Windows. Tested for Windows 2000 and XP.

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Nick Jacobsen (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332551 ] 

Nick Jacobsen commented on NUTCH-82:
------------------------------------

I think the problem here is that a bash script, by definition is not portable.  Yes, you can *emulate* a *sh shell on windows, using cygwin, and build any required tools (df, etc - it does not exist in ALL shell implimentations) - but it is still *emulation*.  In general, anything you can do in bash, you can do in perl, but it is not exactly simple - and sometimes requires perl modules, which I would agree is not a good thing.  I would suggest that someone (I will even look into it) write a java launching tool.  I see providing only a bash control script as restricting the use of nutch to people running *nix systems. 

The only problem with java is that it needs to be compiled, and for something like a control script it is nice to be able to *simply* modify it.  Just a guess here, but doesn't java have a script interpreter built into it?

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332610 ] 

Doug Cutting commented on NUTCH-82:
-----------------------------------

Ant and Tomcat supply both Unix shell scripts and Windows batch files.  Neither uses Perl.  I am hesitant to go this two-implementation route, as Nutch's scripting requirements (especially with MapReduce) are greater than Ant or Tomcat.  Nutch's new scripts manage daemons on remote servers with ssh and rsync, supplied via cygwin on Windows.


> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "AJ Banck (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-82?page=all ]

AJ Banck updated NUTCH-82:
--------------------------

    Attachment: nutch.bat

Update, remove obsolete comment

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-82?page=comments#action_12332559 ] 

Dawid Weiss commented on NUTCH-82:
----------------------------------

I personally disagree Perl is a better alternative to Cygwin... Most people familiar with Unix/ Windows development will have no problems modifying a bash script, whereas a Perl script... hmm.. Perl is perl :) 

As for a pure Java solution, I agree this would be handy. However, Java is quite a pain to invoke, especially with multiple JVM switches such as -Xmx... So you'd probably have to fall back to a 'boot' script anyway at some point. The only pure Java thing that comes to my mind is using ANT to spawn a JVM and then write commons-cli equivalents of command line tools... but this, as much as I hate to have platform-dependent scripts, seems like an overkill compared to the bash solution.

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-82) Nutch Commands should run on Windows without external tools

Posted by "Nick Jacobsen (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-82?page=all ]

Nick Jacobsen updated NUTCH-82:
-------------------------------

    Attachment: nutch.pl

Perl version of the control script, meant to work on both Windows and Unix like operating systems.  Has been tested on Windows 2000/XP/2003 Server, and OpenBSD

> Nutch Commands should run on Windows without external tools
> -----------------------------------------------------------
>
>          Key: NUTCH-82
>          URL: http://issues.apache.org/jira/browse/NUTCH-82
>      Project: Nutch
>         Type: New Feature
>  Environment: Windows 2000
>     Reporter: AJ Banck
>  Attachments: nutch.bat, nutch.bat, nutch.pl
>
> Currently there is only a shellscript to run the Nutch commands. This should be platform independant.
> Best would be Ant tools, or scripts generated by a template tool to avoid replication.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira