You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by ma...@apache.org on 2015/04/10 06:58:06 UTC

svn commit: r1672539 - in /nutch/trunk: CHANGES.txt docker/ docker/Dockerfile docker/README.md

Author: mattmann
Date: Fri Apr 10 04:58:05 2015
New Revision: 1672539

URL: http://svn.apache.org/r1672539
Log:
Contribute Dockerfile for Nutch 1.x from Michael Joyce via mattmann.

Added:
    nutch/trunk/docker/
    nutch/trunk/docker/Dockerfile
    nutch/trunk/docker/README.md
Modified:
    nutch/trunk/CHANGES.txt

Modified: nutch/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1672539&r1=1672538&r2=1672539&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Fri Apr 10 04:58:05 2015
@@ -2,6 +2,8 @@ Nutch Change Log
  
 Nutch Current Development 1.10-SNAPSHOT
 
+* NUTCH-1972 Dockerfile for Nutch 1.x (Michael Joyce via mattmann)
+
 * NUTCH-1771 Indexer fails if a segment is corrupted or incomplete (Diaa, Chong Li via snagel)
 
 * NUTCH-1975 New configuration for CommonCrawlDataDumper tool (Giuseppe Totaro via mattmann)

Added: nutch/trunk/docker/Dockerfile
URL: http://svn.apache.org/viewvc/nutch/trunk/docker/Dockerfile?rev=1672539&view=auto
==============================================================================
--- nutch/trunk/docker/Dockerfile (added)
+++ nutch/trunk/docker/Dockerfile Fri Apr 10 04:58:05 2015
@@ -0,0 +1,40 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM ubuntu:14.04
+MAINTAINER Michael Joyce <jo...@apache.org>
+
+WORKDIR /root/
+
+# Get the package containing apt-add-repository installed for adding repositories
+RUN apt-get update && apt-get install -y software-properties-common
+
+# Add the repository that we'll pull java down from.
+RUN add-apt-repository -y ppa:webupd8team/java && apt-get update && apt-get upgrade -y
+
+# Get Oracle Java 1.7 installed
+RUN echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && apt-get install -y oracle-java7-installer oracle-java7-set-default
+
+# Install various dependencies
+RUN apt-get install -y ant openssh-server vim telnet subversion rsync curl build-essential 
+
+# Set up JAVA_HOME
+RUN echo 'export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")' >> $HOME/.bashrc
+
+# Checkout and build the nutch trunk
+RUN svn checkout https://svn.apache.org/repos/asf/nutch/trunk/ nutch_source && cd nutch_source && ant
+
+# Convenience symlink to Nutch runtime local
+RUN ln -s nutch_source/runtime/local $HOME/nutch

Added: nutch/trunk/docker/README.md
URL: http://svn.apache.org/viewvc/nutch/trunk/docker/README.md?rev=1672539&view=auto
==============================================================================
--- nutch/trunk/docker/README.md (added)
+++ nutch/trunk/docker/README.md Fri Apr 10 04:58:05 2015
@@ -0,0 +1,58 @@
+# Nutch Dockerfile #
+
+Get up and running quickly with Nutch on Docker.
+
+## What is Nutch?
+
+![Nutch logo](https://wiki.apache.org/nutch/FrontPage?action=AttachFile&do=get&target=nutch_logo_medium.gif "Nutch")
+
+Apache Nutch is a highly extensible and scalable open source web crawler software project.
+
+Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster
+
+## Docker Image
+
+Current configuration of this image consists of components:
+
+*	Nutch 1.x
+
+##  Base Image
+
+* [ubuntu:14.04](https://registry.hub.docker.com/_/ubuntu/)
+
+## Tips
+
+You may need to alias docker to "docker --tls" if you see errors such as:
+
+```
+2015/04/07 09:19:56 Post http://192.168.59.103:2376/v1.14/containers/create?name=NutchContainer: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"
+```
+
+The easiest way to do this:
+
+1. ```alias docker="docker --tls"```
+
+## Installation
+
+1. Install [Docker](https://www.docker.com/).
+
+2. Build from files in this directory:
+
+	$(boot2docker shellinit | grep export)
+	docker build -t apache/nutch .
+
+## Usage
+
+Start docker
+
+	boot2docker up
+	$(boot2docker shellinit | grep export)
+
+Start up an image and attach to it
+
+    docker run -t -i -d --name nutchcontainer apache/nutch /bin/bash
+    docker attach --sig-proxy=false nutchcontainer
+
+Nutch is located in ~/nutch and is almost ready to run.
+You will need to set seed URLs and update the configuration with your crawler's Agent Name.
+For additional "getting started" information checkout the [Nutch Tutorial](https://wiki.apache.org/nutch/NutchTutorial).