You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Diaa Abdallah <di...@gmail.com> on 2014/05/17 23:33:58 UTC

Creating Windows bash files for nutch

Hi,
Currently nutch isn't very friendly to windows users as it requires cygwin
to run and there are a lot of issues with Hadoop 1.x branch, which nutch
bundles with it, due to the "set tmp permission" issue.

What do you think about doing two things:
1. Move to Hadoop 2.4 to support windows/linux and the new map reduce api
2. Create bash scripts to run crawls with

Relevant JIRA Issues:

Re: Creating Windows bash files for nutch

Posted by Julien Nioche <li...@gmail.com>.
Hi Diaa

That could be useful when running in local mode at least but am not sure
about the distributed mode. Doesn't Hadoop rely on Cygwin in order to be
used on Windows (at least the Apache distro) anyway?

Julien




On 18 May 2014 20:47, Diaa Abdallah <di...@gmail.com> wrote:

> I meant writing batch/cmd scripts for windows that don't require Cygwin.
>
> I was thinking of writing those scripts but wanted to check if people
> think it's a good idea.
>
>
> On Sunday, May 18, 2014, Julien Nioche <li...@gmail.com>
> wrote:
>
>> Hi
>>
>>
>>> Currently nutch isn't very friendly to windows users as it requires
>>> cygwin to run and there are a lot of issues with Hadoop 1.x branch, which
>>> nutch bundles with it, due to the "set tmp permission" issue.
>>>
>>> What do you think about doing two things:
>>> 1. Move to Hadoop 2.4 to support windows/linux and the new map reduce api
>>>
>>
>> it already works on Linux. Am pretty sure there already is  a JIRA for
>> the port to the new map reduce API. As for windows, feel free to contribute
>> an alternative set of scripts if you want to.
>>
>>
>>> 2. Create bash scripts to run crawls with
>>>
>>
>> what's wrong with src/bin/crawl.sh?
>>
>> Julien
>>
>>
>>
>>> Relevant JIRA Issues:
>>>
>>>
>>
>>
>> --
>>
>> Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
>>
>


-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Creating Windows bash files for nutch

Posted by Diaa Abdallah <di...@gmail.com>.
I meant writing batch/cmd scripts for windows that don't require Cygwin.

I was thinking of writing those scripts but wanted to check if people think
it's a good idea.

On Sunday, May 18, 2014, Julien Nioche <li...@gmail.com>
wrote:

> Hi
>
>
>> Currently nutch isn't very friendly to windows users as it requires
>> cygwin to run and there are a lot of issues with Hadoop 1.x branch, which
>> nutch bundles with it, due to the "set tmp permission" issue.
>>
>> What do you think about doing two things:
>> 1. Move to Hadoop 2.4 to support windows/linux and the new map reduce api
>>
>
> it already works on Linux. Am pretty sure there already is  a JIRA for the
> port to the new map reduce API. As for windows, feel free to contribute an
> alternative set of scripts if you want to.
>
>
>> 2. Create bash scripts to run crawls with
>>
>
> what's wrong with src/bin/crawl.sh?
>
> Julien
>
>
>
>> Relevant JIRA Issues:
>>
>>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Re: Creating Windows bash files for nutch

Posted by Julien Nioche <li...@gmail.com>.
Hi


> Currently nutch isn't very friendly to windows users as it requires cygwin
> to run and there are a lot of issues with Hadoop 1.x branch, which nutch
> bundles with it, due to the "set tmp permission" issue.
>
> What do you think about doing two things:
> 1. Move to Hadoop 2.4 to support windows/linux and the new map reduce api
>

it already works on Linux. Am pretty sure there already is  a JIRA for the
port to the new map reduce API. As for windows, feel free to contribute an
alternative set of scripts if you want to.


> 2. Create bash scripts to run crawls with
>

what's wrong with src/bin/crawl.sh?

Julien



> Relevant JIRA Issues:
>
>


-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble