You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Armel Nene (JIRA)" <ji...@apache.org> on 2007/01/05 15:44:27 UTC

[jira] Created: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: NUTCH-427
                 URL: https://issues.apache.org/jira/browse/NUTCH-427
             Project: Nutch
          Issue Type: New Feature
          Components: fetcher
    Affects Versions: 0.8.1
         Environment: JAVA - OS independent
            Reporter: Armel Nene
            Priority: Critical


Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
Author:   Armel T. Nene
Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com

A.  Introduction

    The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
    the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
    behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
    support all the properties from the JCifs library.
    You can find more information on the following site: http://jcifs.samba.org/
    The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
    
B.  Installation

    1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
                        Put the "smb.properties" file in the NUTCHHOME/conf directory.
                        Configure the properties in "smb.properties" file
                        Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory

    2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
                        Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
                        Update the build.xml in NUTCHHOME/src/plugin to include plugin
                        Update the NUTCHHOME/default.properties file to include plugin
                        run ant to build
                        Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
                        Enable the plugin by updating the nutch-site.xml file

C: Known Issues

    1) URLMalformedException: unkown protocol: smb

       The SMB URL protocol handler is not being successfully installed. 
       In short, the jCIFS jar must be loaded by the System class loader.

       Workaround: a) a short term solutions will be to installed the JCIFS jar 
                      library found in protocol-smb folder in 
                      JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext

                   b) After completing step a), if the exeception is still thrown
                      set the System properties by passing the following arguments
                      to the JVM: 

                      -Djava.protocol.handler.pkgs=jcifs

       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

    2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx

       This problem usually occurs if the following properties are not set correctly in
       the "smb.properties" file:

       - username
       - password
       - domain

       Also refer to the following resources for more information on the list of
       available properties and how to set them:

       http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

       N.B. All properties should set in the "smb.properties" file. You can set 
            all supported JCIFS properties in the "smb.properties" file.
     
    3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
       conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

RE: [jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Armel T. Nene" <ar...@idna-solutions.com>.
Andrzej

This feature is not critical and that's a mistake from my part. After several more testing, we have found that this version was not stable enough yet. We are working on a stable version that should be uploaded as soon as we have it done.

Armel

-----Original Message-----
From: Andrzej Bialecki (JIRA) [mailto:jira@apache.org] 
Sent: 07 March 2007 22:29
To: nutch-dev@lucene.apache.org
Subject: [jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.


     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-427:
------------------------------------

    Priority: Major  (was: Critical)

New features are not critical. This plugin uses an LGPL library, which cannot be included in Nutch repository.

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>         Attachments: protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
>     
> B.  Installation
>     1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.7/711 - Release Date: 05/03/2007 09:41
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.7/711 - Release Date: 05/03/2007 09:41
 


[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462513 ] 

Andrzej Bialecki  commented on NUTCH-427:
-----------------------------------------

JCIFS is licensed under LGPL, so it cannot be included in Nutch distribution. As a consequence, we could add this plugin but it wouldn't be a part of the regular build ...

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Critical
>         Attachments: protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
>     
> B.  Installation
>     1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Vadim Bauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vadim Bauer updated NUTCH-427:
------------------------------

          Description: 
Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
Author:   Armel T. Nene 
Update:   Vadim Bauer
Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e

A.  Introduction

    The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
    the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
    behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
    support all the properties from the JCifs library.
    You can find more information on the following site: http://jcifs.samba.org/
    The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
    
B.  Installation

    1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
                        Put the "smb.properties" file in the NUTCHHOME/conf directory.
                        Configure the properties in "smb.properties" file
                        Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
				e.g. <property>
    				     	<name>plugin.includes</name>
    				     	<value>protocol-smb| other plugins...</value>
    				     	<description>
 	 			     	</description>
 	 			     </property>

    2)  Source code:    The protocol-smb sources can be found in the ../src directory.
				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
                        Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
                        Update the build.xml in NUTCHHOME/src/plugin to include plugin
                        Update the NUTCHHOME/default.properties file to include plugin
                        run ant to build
                        Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
                        Enable the plugin by updating the nutch-site.xml file

C: Known Issues

    1) URLMalformedException: unkown protocol: smb

       The SMB URL protocol handler is not being successfully installed. 
       In short, the jCIFS jar must be loaded by the System class loader.

       Workaround: a) a short term solutions will be to installed the JCIFS jar 
                      library found in protocol-smb folder in 
                      JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext

                   b) After completing step a), if the exeception is still thrown
                      set the System properties by passing the following arguments
                      to the JVM: 

                      -Djava.protocol.handler.pkgs=jcifs

			 c) You can set the property also in your Code for example if 
			    you start Crawling with org.apache.nutch.crawl.Crawl
			    Add the following two lines. This will be the Same like in b)
			    public static void main(String args[]) throws Exception {
	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
				//and so on

       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

    2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx

       This problem usually occurs if the following properties are not set correctly in
       the "smb.properties" file:

       - username
       - password
       - domain

       Also refer to the following resources for more information on the list of
       available properties and how to set them:

       http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

       N.B. All properties should set in the "smb.properties" file. You can set 
            all supported JCIFS properties in the "smb.properties" file.
     
    3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
       conclusion on other OS.

  was:
Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
Author:   Armel T. Nene
Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com

A.  Introduction

    The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
    the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
    behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
    support all the properties from the JCifs library.
    You can find more information on the following site: http://jcifs.samba.org/
    The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
    
B.  Installation

    1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
                        Put the "smb.properties" file in the NUTCHHOME/conf directory.
                        Configure the properties in "smb.properties" file
                        Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory

    2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
                        Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
                        Update the build.xml in NUTCHHOME/src/plugin to include plugin
                        Update the NUTCHHOME/default.properties file to include plugin
                        run ant to build
                        Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
                        Enable the plugin by updating the nutch-site.xml file

C: Known Issues

    1) URLMalformedException: unkown protocol: smb

       The SMB URL protocol handler is not being successfully installed. 
       In short, the jCIFS jar must be loaded by the System class loader.

       Workaround: a) a short term solutions will be to installed the JCIFS jar 
                      library found in protocol-smb folder in 
                      JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext

                   b) After completing step a), if the exeception is still thrown
                      set the System properties by passing the following arguments
                      to the JVM: 

                      -Djava.protocol.handler.pkgs=jcifs

       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

    2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx

       This problem usually occurs if the following properties are not set correctly in
       the "smb.properties" file:

       - username
       - password
       - domain

       Also refer to the following resources for more information on the list of
       available properties and how to set them:

       http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

       N.B. All properties should set in the "smb.properties" file. You can set 
            all supported JCIFS properties in the "smb.properties" file.
     
    3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
       conclusion on other OS. It should also run on any other OS without any change.

    Affects Version/s: 1.0.0
                       0.9.0

The update fixes some issues which I had with the old version by trying to use it with Nutch 1.0-dev

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>         Attachments: protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Vadim Bauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497851 ] 

Vadim Bauer commented on NUTCH-427:
-----------------------------------

There is an Error in the plugin.xml File

the plugin id should be protocol-smb and not protocol-file!

  <?xml version="1.0" encoding="UTF-8" ?> 
- <!--     Document   : plugin.xml
    Created on : 03 January 2007, 10:41
    Author     : Armel T. Nene
    Description:
        This file is used by Nutch to configure the SMB protocol 

  --> 
- <plugin id="protocol-smb" name="SMB Protocol Plug-in" version="1.0.0" provider-name="iDNA Solutions LTD">
- <runtime>
- <library name="protocol-smb.jar">
  <export name="*" /> 
  </library>
  <library name="jcifs-1.2.12.jar" /> 
  </runtime>
- <requires>
  <import plugin="nutch-extensionpoints" /> 
  </requires>
- <extension id="org.apache.nutch.protocol.smb" name="SMBProtocol" point="org.apache.nutch.protocol.Protocol">
- <implementation id="org.apache.nutch.protocol.smb.SMB" class="org.apache.nutch.protocol.smb.SMB">
  <parameter name="protocolName" value="SMB" /> 
  </implementation>
  </extension>
  </plugin>

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>         Attachments: protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
>     
> B.  Installation
>     1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment: protocol-smb-diff.txt

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645933#action_12645933 ] 

ilatypov edited comment on NUTCH-427 at 11/7/08 5:47 PM:
---------------------------------------------------------------

Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app.

Tried but could not activate the Apache logging.


      was (Author: ilatypov):
    
Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app.

Tried but could not activate the Apache logging.

Files ../protocol-smb/src/plugin/protocol-smb/lib/jcifs-1.2.13.jar and ./lib/jcifs-1.2.13.jar differ
Files ../protocol-smb/src/plugin/protocol-smb/lib/jcifs-1.3.0.jar and ./lib/jcifs-1.3.0.jar differ
diff -urN ../protocol-smb/src/plugin/protocol-smb/plugin.xml ./plugin.xml
--- ../protocol-smb/src/plugin/protocol-smb/plugin.xml	2007-05-25 22:31:38.000000000 -0400
+++ ./plugin.xml	2008-11-07 14:28:38.184388000 -0500
@@ -13,7 +13,7 @@
     <library name="protocol-smb.jar">
       <export name="*" />
     </library>
-    <library name="jcifs-1.2.13.jar" />
+    <library name="jcifs-1.3.0.jar" />
   </runtime>
   <requires>
     <import plugin="nutch-extensionpoints" />
@@ -23,4 +23,4 @@
       <parameter name="protocolName" value="smb" />
     </implementation>
   </extension>
-</plugin>
\ No newline at end of file
+</plugin>
Files ../protocol-smb/src/plugin/protocol-smb/src/h.pdf and ./src/h.pdf differ
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMB.java ./src/java/org/apache/nutch/protocol/smb/SMB.java
--- ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMB.java	2007-05-25 22:28:20.000000000 -0400
+++ ./src/java/org/apache/nutch/protocol/smb/SMB.java	2008-11-07 14:38:38.583156000 -0500
@@ -51,8 +51,8 @@
      */
     public ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum) {
         String urlString = url.toString();
-        System.out.println("SMB Protocol: Now using the SMB protocol");
-        System.out.println("SMB Protocol will interrogate the following URL: "+urlString);
+        // System.out.println("SMB Protocol: Now using the SMB protocol");
+        // System.out.println("SMB Protocol will interrogate the following URL: "+urlString);
         try{
             URL u = new URL(urlString);
             int redirects = 0;
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMBResponse.java ./src/java/org/apache/nutch/protocol/smb/SMBResponse.java
--- ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMBResponse.java	2007-01-05 10:34:38.000000000 -0500
+++ ./src/java/org/apache/nutch/protocol/smb/SMBResponse.java	2008-11-07 20:14:01.791149000 -0500
@@ -13,8 +13,12 @@
 import jcifs.smb.SmbException;
 import jcifs.smb.SmbFile;
 import jcifs.smb.SmbFileInputStream;
+
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
+
+import org.apache.commons.lang.StringEscapeUtils;
+
 import org.apache.hadoop.conf.Configuration;
 import org.apache.nutch.crawl.CrawlDatum;
 import org.apache.nutch.metadata.Metadata;
@@ -137,7 +141,13 @@
      */
     private void getDirAsHttpResponse(SmbFile f) throws SmbException {
         String path = f.getName();
-        this.content = list2html(f.listFiles(), path, "/".equals(path) ? false : true);
+        SmbFile[] fileList = {};
+        try {
+            fileList = f.listFiles();
+        } catch (SmbException e) {
+            // leave fileList empty
+        }
+        this.content = list2html(fileList, path, "/".equals(path) ? false : true);
         
         // set headers
         headers.set(Response.CONTENT_TYPE, "text/html");
@@ -160,19 +170,19 @@
         
         // capture content
         int len = (int)size;
-        byte[] bytes = new byte[len];
-        java.io.InputStream is = new SmbFileInputStream(f);
+        this.content = new byte[len];
+        java.io.InputStream is = f.getInputStream();
         int offset = 0;
         int n = 0;
         while(offset < size && (n = is.read(this.content, offset, len-offset)) >= 0){
             offset += n;
+            LOG.debug("Read " + n + " bytes from " + f.getName());
         }
         
         if(offset < len){
             if(LOG.isWarnEnabled())
                 LOG.warn("not enough bytes read from file: "+f.getCanonicalPath());
         }
-        
         is.close();
         
         // set headers
@@ -201,12 +211,27 @@
         SmbFile f;
         for(int i=0; i<smbFile.length; i++){
             f = smbFile[i];
-            String name = f.getName();
+            if (!f.exists()) {
+                LOG.info("SMB file: " + f.getName() + " does not exist");
+                continue;
+            }
+            
+            if (!f.canRead()) {
+                LOG.info("Cannot read SMB file: " + f.getName());
+                continue;
+            }
+
+            String htmlName = StringEscapeUtils.escapeHtml(f.getName());
             String time = String.valueOf(f.lastModified());
-            if(f.isDirectory()){
-                x.append("<a href='"+name+"'>"+name+"/</a>\t");
+            if (f.isDirectory()) {
+                x.append("<a href=\"" + htmlName + "\">" + htmlName + "/</a>\t");
                 x.append(time+"\t-\n");
-            } else{} // ignore any other
+            } else if (f.isFile()) {
+                x.append("<a href=\"" + htmlName + "\">" + htmlName + "</a>\t");
+                x.append(time+"\t-\n");
+            } else {
+                // ignore any other
+            }
         }
         x.append("</pre></body></html>\n");
         return new String(x).getBytes();
Files ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/test/SMBTest.class and ./src/java/org/apache/nutch/protocol/smb/test/SMBTest.class differ
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/test/SMBTest.java ./src/java/org/apache/nutch/protocol/smb/test/SMBTest.java
--- ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/test/SMBTest.java	2007-01-05 15:07:44.000000000 -0500
+++ ./src/java/org/apache/nutch/protocol/smb/test/SMBTest.java	2008-11-07 13:57:25.937700000 -0500
@@ -13,6 +13,10 @@
 import java.net.MalformedURLException;
 import java.net.URL;
 import java.util.Properties;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
 import jcifs.smb.SmbException;
 import jcifs.smb.SmbFile;
 
@@ -26,16 +30,22 @@
     public SMBTest() {
     }
     
+    public static String USAGE = "Usage: SMBTest smb.properties \"smb://\" SERVER \"/\" SHARE \"/\" (FILE OUTPUT_FILE|DIR)";
+
     /**
      * @param args the command line arguments
      */
     public static void main(String[] args) throws MalformedURLException, SmbException, FileNotFoundException, IOException {
    
+        if (args.length < 2) {
+            System.out.println(USAGE);
+            System.exit(1);
+        }
         Properties prop = new Properties();
-        prop.load(new FileInputStream("smb.properties"));
+        prop.load(new FileInputStream(args[0]));
         jcifs.Config.setProperties(prop);
 //        URL url = new URL("smb://xxxxxxxx");
-        SmbFile file = new SmbFile("smb://xxxxxxx");
+        SmbFile file = new SmbFile(args[1]);
         System.out.println("Does the file exist on the share: "+file.exists());
 //        System.out.println("here is the protocol used: "+url.getProtocol());
 //        System.out.println("here is the host used: "+url.getHost());
@@ -46,7 +56,26 @@
                 System.out.println(file.getName());
             }
         }else if(file.isFile()){
-                System.out.println("Here is the name of the file: "+file.getName());
+                if (args.length != 3) {
+                    System.err.println(USAGE);
+                    System.exit(1);
+                }
+                System.out.println("Name " + file.getName() + ", length " + file.getContentLength());
+                System.out.println("Saving to " + args[2] + "...");
+                java.io.InputStream is = file.getInputStream();
+                java.io.FileOutputStream fos = new java.io.FileOutputStream(args[2]);
+                int numread;
+                byte b[] = new byte[4096];
+                do {
+                    numread = is.read(b);
+                    if (numread < 0) {
+                        break;
+                    }
+                    fos.write(b, 0, numread);
+                } while(true);
+                is.close();
+                fos.close();
+                System.out.println("Done.");
             }
     }
     
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/log4j.properties ./src/log4j.properties
--- ../protocol-smb/src/plugin/protocol-smb/src/log4j.properties	1969-12-31 19:00:00.000000000 -0500
+++ ./src/log4j.properties	2008-11-07 13:53:14.549708000 -0500
@@ -0,0 +1,33 @@
+# RootLogger - DailyRollingFileAppender
+# log4j.rootLogger=DEBUG,DRFA
+log4j.rootLogger=DEBUG,stdout
+
+# Logging Threshold
+log4j.threshhold=ALL
+
+#
+# stdout
+# Add *stdout* to rootlogger above if you want to use this 
+#
+
+log4j.appender.stdout=org.apache.log4j.ConsoleAppender
+log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
+log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
+
+#
+# plain layout used for commandline tools to output to console
+#
+log4j.appender.cmdstdout=org.apache.log4j.ConsoleAppender
+log4j.appender.cmdstdout.layout=org.apache.log4j.PatternLayout
+log4j.appender.cmdstdout.layout.ConversionPattern=%m%n
+
+
+# Rolling file
+log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.DRFA.File=log.txt
+log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
+log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
+log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
+
+
+
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/test.sh ./src/test.sh
--- ../protocol-smb/src/plugin/protocol-smb/src/test.sh	1969-12-31 19:00:00.000000000 -0500
+++ ./src/test.sh	2008-11-07 20:32:08.480629000 -0500
@@ -0,0 +1,10 @@
+#! /bin/sh -ex
+export PATH=$PATH:/c/Program\ Files/Java/jdk1.6.0_07/bin
+libdir="../../../../lib/"
+CP="../lib/jcifs-1.3.0.jar;${libdir}/log4j-1.2.15.jar;${libdir}/commons-logging-1.0.4.jar;${libdir}/commons-logging-api-1.0.4.jar;java;."
+DEFS=("-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Log4JCategoryLog" "-Dlog4j.configuration=log4j.properties")
+# DEFS=()
+MAIN="org/apache/nutch/protocol/smb/test/SMBTest"
+javac -cp "$CP" "java/${MAIN}.java"
+java -cp "$CP" "${DEFS[@]}" "$MAIN" "${libdir}../conf/smb.properties" "smb://SERVER/SHARE/DIR/FILE" OUTPUT_FILE
+

  
> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Vadim Bauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vadim Bauer updated NUTCH-427:
------------------------------

    Attachment: protocol-smb.zip

This is an update to the previous Version. check the Included readme.txt 

Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
Author:   Armel T. Nene 
Update:   Vadim Bauer
Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e

A.  Introduction

    The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
    the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
    behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
    support all the properties from the JCifs library.
    You can find more information on the following site: http://jcifs.samba.org/
    The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
    
B.  Installation

    1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
                        Put the "smb.properties" file in the NUTCHHOME/conf directory.
                        Configure the properties in "smb.properties" file
                        Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
				e.g. <property>
    				     	<name>plugin.includes</name>
    				     	<value>protocol-smb| other plugins...</value>
    				     	<description>
 	 			     	</description>
 	 			     </property>

    2)  Source code:    The protocol-smb sources can be found in the ../src directory.
				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
                        Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
                        Update the build.xml in NUTCHHOME/src/plugin to include plugin
                        Update the NUTCHHOME/default.properties file to include plugin
                        run ant to build
                        Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
                        Enable the plugin by updating the nutch-site.xml file

C: Known Issues

    1) URLMalformedException: unkown protocol: smb

       The SMB URL protocol handler is not being successfully installed. 
       In short, the jCIFS jar must be loaded by the System class loader.

       Workaround: a) a short term solutions will be to installed the JCIFS jar 
                      library found in protocol-smb folder in 
                      JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext

                   b) After completing step a), if the exeception is still thrown
                      set the System properties by passing the following arguments
                      to the JVM: 

                      -Djava.protocol.handler.pkgs=jcifs

			 c) You can set the property also in your Code for example if 
			    you start Crawling with org.apache.nutch.crawl.Crawl
			    Add the following two lines. This will be the Same like in b)
			    public static void main(String args[]) throws Exception {
	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
				//and so on

       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

    2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx

       This problem usually occurs if the following properties are not set correctly in
       the "smb.properties" file:

       - username
       - password
       - domain

       Also refer to the following resources for more information on the list of
       available properties and how to set them:

       http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

       N.B. All properties should set in the "smb.properties" file. You can set 
            all supported JCIFS properties in the "smb.properties" file.
     
    3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
       conclusion on other OS.

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>         Attachments: protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
>     
> B.  Installation
>     1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-427:
------------------------------------

    Priority: Minor  (was: Major)

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Vincent Couturier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790244#action_12790244 ] 

Vincent Couturier commented on NUTCH-427:
-----------------------------------------

The last attached zip does not contain the changes of Ilquiz Latypov. It's necessary to patch the zip with the protocol-smb-diff.txt. I will try to put a patched version but if Iluqiz can put his updated version it would be easier.

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Armel Nene (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Armel Nene updated NUTCH-427:
-----------------------------

    Attachment: protocol-smb.zip

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Critical
>         Attachments: protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
>     
> B.  Installation
>     1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment: protocol-smb.zip

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment:     (was: protocol-smb-diff.txt)

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment:     (was: protocol-smb.zip)

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment:     (was: protocol-smb.zip)

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Joe Hurley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622457#action_12622457 ] 

Joe Hurley commented on NUTCH-427:
----------------------------------

Is there a reason why this plugin only handles directories?  I had to make the following changes to enable file crawling:

in SMBResponse.java:
replace `byte[] byte` with `this.content` on line 163
remove lines 206 and 209

also It got stuck in the file not found case.  After examining the protocol-file code, I moved the else statement in SMB.java, lines 76 and 77 outside of the curly bracket on line 78.  After this change, the code could continue after encountering a file not found rather than looping forever.

And since then, it seems to work nicely on Windows Vista.  Thanks for the plugin!

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>         Attachments: protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859116#action_12859116 ] 

Ilguiz Latypov commented on NUTCH-427:
--------------------------------------

I hesitate adding the .zip file because (a) it hides the intention of the change and (b) other developers who might have already modified their copies would have difficulty merging my change.

I believe the GNU patch tool will apply my suggested change automatically, provided that one resides in the right working directory and, possibly, applies the -pX option where X is the number of upper level directory names to ignore in the patch.


> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment:     (was: protocol-smb-diff.txt)

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Armel Nene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462514 ] 

Armel Nene commented on NUTCH-427:
----------------------------------

The best way is to make the plugin available on plugin central, so that
people who needs the plugin can download it from there.



> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Critical
>         Attachments: protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
>     
> B.  Installation
>     1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment: protocol-smb-diff.txt

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment:     (was: protocol-smb.zip)

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment: protocol-smb.zip

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645933#action_12645933 ] 

ilatypov edited comment on NUTCH-427 at 11/7/08 8:50 PM:
---------------------------------------------------------------

Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app.  Protected special characters such as apostrophe and hash 
mark with URL encoding.

Fixed the infinite retry loop in SMB.java.

Tried but could not activate the Apache logging.


      was (Author: ilatypov):
    Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app.

Tried but could not activate the Apache logging.

  
> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-427:
------------------------------------

    Priority: Major  (was: Critical)

New features are not critical. This plugin uses an LGPL library, which cannot be included in Nutch repository.

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>         Attachments: protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) .
>     
> B.  Installation
>     1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS. It should also run on any other OS without any change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment: protocol-smb-dist.zip

Applied my diff to simplify importing into the Subversion tree.  The build directory should not be imported, and the src/plugin/build.xml file should only add the new protocol-smb deploy and clean targets.

The previous author did not grant the license to ASF.


> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb-dist.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment: protocol-smb-diff.txt

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.

Posted by "Ilguiz Latypov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilguiz Latypov updated NUTCH-427:
---------------------------------

    Attachment: protocol-smb.zip


Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app.

Tried but could not activate the Apache logging.

Files ../protocol-smb/src/plugin/protocol-smb/lib/jcifs-1.2.13.jar and ./lib/jcifs-1.2.13.jar differ
Files ../protocol-smb/src/plugin/protocol-smb/lib/jcifs-1.3.0.jar and ./lib/jcifs-1.3.0.jar differ
diff -urN ../protocol-smb/src/plugin/protocol-smb/plugin.xml ./plugin.xml
--- ../protocol-smb/src/plugin/protocol-smb/plugin.xml	2007-05-25 22:31:38.000000000 -0400
+++ ./plugin.xml	2008-11-07 14:28:38.184388000 -0500
@@ -13,7 +13,7 @@
     <library name="protocol-smb.jar">
       <export name="*" />
     </library>
-    <library name="jcifs-1.2.13.jar" />
+    <library name="jcifs-1.3.0.jar" />
   </runtime>
   <requires>
     <import plugin="nutch-extensionpoints" />
@@ -23,4 +23,4 @@
       <parameter name="protocolName" value="smb" />
     </implementation>
   </extension>
-</plugin>
\ No newline at end of file
+</plugin>
Files ../protocol-smb/src/plugin/protocol-smb/src/h.pdf and ./src/h.pdf differ
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMB.java ./src/java/org/apache/nutch/protocol/smb/SMB.java
--- ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMB.java	2007-05-25 22:28:20.000000000 -0400
+++ ./src/java/org/apache/nutch/protocol/smb/SMB.java	2008-11-07 14:38:38.583156000 -0500
@@ -51,8 +51,8 @@
      */
     public ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum) {
         String urlString = url.toString();
-        System.out.println("SMB Protocol: Now using the SMB protocol");
-        System.out.println("SMB Protocol will interrogate the following URL: "+urlString);
+        // System.out.println("SMB Protocol: Now using the SMB protocol");
+        // System.out.println("SMB Protocol will interrogate the following URL: "+urlString);
         try{
             URL u = new URL(urlString);
             int redirects = 0;
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMBResponse.java ./src/java/org/apache/nutch/protocol/smb/SMBResponse.java
--- ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SMBResponse.java	2007-01-05 10:34:38.000000000 -0500
+++ ./src/java/org/apache/nutch/protocol/smb/SMBResponse.java	2008-11-07 20:14:01.791149000 -0500
@@ -13,8 +13,12 @@
 import jcifs.smb.SmbException;
 import jcifs.smb.SmbFile;
 import jcifs.smb.SmbFileInputStream;
+
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
+
+import org.apache.commons.lang.StringEscapeUtils;
+
 import org.apache.hadoop.conf.Configuration;
 import org.apache.nutch.crawl.CrawlDatum;
 import org.apache.nutch.metadata.Metadata;
@@ -137,7 +141,13 @@
      */
     private void getDirAsHttpResponse(SmbFile f) throws SmbException {
         String path = f.getName();
-        this.content = list2html(f.listFiles(), path, "/".equals(path) ? false : true);
+        SmbFile[] fileList = {};
+        try {
+            fileList = f.listFiles();
+        } catch (SmbException e) {
+            // leave fileList empty
+        }
+        this.content = list2html(fileList, path, "/".equals(path) ? false : true);
         
         // set headers
         headers.set(Response.CONTENT_TYPE, "text/html");
@@ -160,19 +170,19 @@
         
         // capture content
         int len = (int)size;
-        byte[] bytes = new byte[len];
-        java.io.InputStream is = new SmbFileInputStream(f);
+        this.content = new byte[len];
+        java.io.InputStream is = f.getInputStream();
         int offset = 0;
         int n = 0;
         while(offset < size && (n = is.read(this.content, offset, len-offset)) >= 0){
             offset += n;
+            LOG.debug("Read " + n + " bytes from " + f.getName());
         }
         
         if(offset < len){
             if(LOG.isWarnEnabled())
                 LOG.warn("not enough bytes read from file: "+f.getCanonicalPath());
         }
-        
         is.close();
         
         // set headers
@@ -201,12 +211,27 @@
         SmbFile f;
         for(int i=0; i<smbFile.length; i++){
             f = smbFile[i];
-            String name = f.getName();
+            if (!f.exists()) {
+                LOG.info("SMB file: " + f.getName() + " does not exist");
+                continue;
+            }
+            
+            if (!f.canRead()) {
+                LOG.info("Cannot read SMB file: " + f.getName());
+                continue;
+            }
+
+            String htmlName = StringEscapeUtils.escapeHtml(f.getName());
             String time = String.valueOf(f.lastModified());
-            if(f.isDirectory()){
-                x.append("<a href='"+name+"'>"+name+"/</a>\t");
+            if (f.isDirectory()) {
+                x.append("<a href=\"" + htmlName + "\">" + htmlName + "/</a>\t");
                 x.append(time+"\t-\n");
-            } else{} // ignore any other
+            } else if (f.isFile()) {
+                x.append("<a href=\"" + htmlName + "\">" + htmlName + "</a>\t");
+                x.append(time+"\t-\n");
+            } else {
+                // ignore any other
+            }
         }
         x.append("</pre></body></html>\n");
         return new String(x).getBytes();
Files ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/test/SMBTest.class and ./src/java/org/apache/nutch/protocol/smb/test/SMBTest.class differ
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/test/SMBTest.java ./src/java/org/apache/nutch/protocol/smb/test/SMBTest.java
--- ../protocol-smb/src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/test/SMBTest.java	2007-01-05 15:07:44.000000000 -0500
+++ ./src/java/org/apache/nutch/protocol/smb/test/SMBTest.java	2008-11-07 13:57:25.937700000 -0500
@@ -13,6 +13,10 @@
 import java.net.MalformedURLException;
 import java.net.URL;
 import java.util.Properties;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
 import jcifs.smb.SmbException;
 import jcifs.smb.SmbFile;
 
@@ -26,16 +30,22 @@
     public SMBTest() {
     }
     
+    public static String USAGE = "Usage: SMBTest smb.properties \"smb://\" SERVER \"/\" SHARE \"/\" (FILE OUTPUT_FILE|DIR)";
+
     /**
      * @param args the command line arguments
      */
     public static void main(String[] args) throws MalformedURLException, SmbException, FileNotFoundException, IOException {
    
+        if (args.length < 2) {
+            System.out.println(USAGE);
+            System.exit(1);
+        }
         Properties prop = new Properties();
-        prop.load(new FileInputStream("smb.properties"));
+        prop.load(new FileInputStream(args[0]));
         jcifs.Config.setProperties(prop);
 //        URL url = new URL("smb://xxxxxxxx");
-        SmbFile file = new SmbFile("smb://xxxxxxx");
+        SmbFile file = new SmbFile(args[1]);
         System.out.println("Does the file exist on the share: "+file.exists());
 //        System.out.println("here is the protocol used: "+url.getProtocol());
 //        System.out.println("here is the host used: "+url.getHost());
@@ -46,7 +56,26 @@
                 System.out.println(file.getName());
             }
         }else if(file.isFile()){
-                System.out.println("Here is the name of the file: "+file.getName());
+                if (args.length != 3) {
+                    System.err.println(USAGE);
+                    System.exit(1);
+                }
+                System.out.println("Name " + file.getName() + ", length " + file.getContentLength());
+                System.out.println("Saving to " + args[2] + "...");
+                java.io.InputStream is = file.getInputStream();
+                java.io.FileOutputStream fos = new java.io.FileOutputStream(args[2]);
+                int numread;
+                byte b[] = new byte[4096];
+                do {
+                    numread = is.read(b);
+                    if (numread < 0) {
+                        break;
+                    }
+                    fos.write(b, 0, numread);
+                } while(true);
+                is.close();
+                fos.close();
+                System.out.println("Done.");
             }
     }
     
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/log4j.properties ./src/log4j.properties
--- ../protocol-smb/src/plugin/protocol-smb/src/log4j.properties	1969-12-31 19:00:00.000000000 -0500
+++ ./src/log4j.properties	2008-11-07 13:53:14.549708000 -0500
@@ -0,0 +1,33 @@
+# RootLogger - DailyRollingFileAppender
+# log4j.rootLogger=DEBUG,DRFA
+log4j.rootLogger=DEBUG,stdout
+
+# Logging Threshold
+log4j.threshhold=ALL
+
+#
+# stdout
+# Add *stdout* to rootlogger above if you want to use this 
+#
+
+log4j.appender.stdout=org.apache.log4j.ConsoleAppender
+log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
+log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
+
+#
+# plain layout used for commandline tools to output to console
+#
+log4j.appender.cmdstdout=org.apache.log4j.ConsoleAppender
+log4j.appender.cmdstdout.layout=org.apache.log4j.PatternLayout
+log4j.appender.cmdstdout.layout.ConversionPattern=%m%n
+
+
+# Rolling file
+log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.DRFA.File=log.txt
+log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
+log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
+log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
+
+
+
diff -urN ../protocol-smb/src/plugin/protocol-smb/src/test.sh ./src/test.sh
--- ../protocol-smb/src/plugin/protocol-smb/src/test.sh	1969-12-31 19:00:00.000000000 -0500
+++ ./src/test.sh	2008-11-07 20:32:08.480629000 -0500
@@ -0,0 +1,10 @@
+#! /bin/sh -ex
+export PATH=$PATH:/c/Program\ Files/Java/jdk1.6.0_07/bin
+libdir="../../../../lib/"
+CP="../lib/jcifs-1.3.0.jar;${libdir}/log4j-1.2.15.jar;${libdir}/commons-logging-1.0.4.jar;${libdir}/commons-logging-api-1.0.4.jar;java;."
+DEFS=("-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Log4JCategoryLog" "-Dlog4j.configuration=log4j.properties")
+# DEFS=()
+MAIN="org/apache/nutch/protocol/smb/test/SMBTest"
+javac -cp "$CP" "java/${MAIN}.java"
+java -cp "$CP" "${DEFS[@]}" "$MAIN" "${libdir}../conf/smb.properties" "smb://SERVER/SHARE/DIR/FILE" OUTPUT_FILE
+


> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.