You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Vincent Couturier (JIRA)" <ji...@apache.org> on 2009/12/14 18:15:18 UTC
[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol
implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl
Microsoft Windows Shares remotely using the CIFS/SMB protocol
implmentation.
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790244#action_12790244 ]
Vincent Couturier commented on NUTCH-427:
-----------------------------------------
The last attached zip does not contain the changes of Ilquiz Latypov. It's necessary to patch the zip with the protocol-smb-diff.txt. I will try to put a patched version but if Iluqiz can put his updated version it would be easier.
> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-427
> URL: https://issues.apache.org/jira/browse/NUTCH-427
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher
> Affects Versions: 0.8.1, 0.9.0, 1.0.0
> Environment: JAVA - OS independent
> Reporter: Armel Nene
> Priority: Minor
> Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title: protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author: Armel T. Nene
> Update: Vadim Bauer
> Email: armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A. Introduction
> The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
> the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
> behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
> support all the properties from the JCifs library.
> You can find more information on the following site: http://jcifs.samba.org/
> The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>
> B. Installation
> 1) Binaries only: The protocol-smb files can be found in the ../plugins directory.
> Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
> Put the "smb.properties" file in the NUTCHHOME/conf directory.
> Configure the properties in "smb.properties" file
> Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> e.g. <property>
> <name>plugin.includes</name>
> <value>protocol-smb| other plugins...</value>
> <description>
> </description>
> </property>
> 2) Source code: The protocol-smb sources can be found in the ../src directory.
> Always refer to the Nutch wiki for detailed instructions on building Nutch. In short:
> Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
> Update the build.xml in NUTCHHOME/src/plugin to include plugin
> Update the NUTCHHOME/default.properties file to include plugin
> run ant to build
> Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
> Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
> 1) URLMalformedException: unkown protocol: smb
> The SMB URL protocol handler is not being successfully installed.
> In short, the jCIFS jar must be loaded by the System class loader.
> Workaround: a) a short term solutions will be to installed the JCIFS jar
> library found in protocol-smb folder in
> JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
> b) After completing step a), if the exeception is still thrown
> set the System properties by passing the following arguments
> to the JVM:
> -Djava.protocol.handler.pkgs=jcifs
> c) You can set the property also in your Code for example if
> you start Crawling with org.apache.nutch.crawl.Crawl
> Add the following two lines. This will be the Same like in b)
> public static void main(String args[]) throws Exception {
> System.setProperty("java.protocol.handler.pkgs", "jcifs");
> new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> //and so on
> Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
> 2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
> This problem usually occurs if the following properties are not set correctly in
> the "smb.properties" file:
> - username
> - password
> - domain
> Also refer to the following resources for more information on the list of
> available properties and how to set them:
> http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
> Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
> N.B. All properties should set in the "smb.properties" file. You can set
> all supported JCIFS properties in the "smb.properties" file.
>
> 3) Only tested on Windows XP and Windows Server 2003. Please report any tests
> conclusion on other OS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.