You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/04/26 17:51:03 UTC

[jira] [Created] (NUTCH-987) Support HTTP auth for Solr communication

Support HTTP auth for Solr communication
----------------------------------------

                 Key: NUTCH-987
                 URL: https://issues.apache.org/jira/browse/NUTCH-987
             Project: Nutch
          Issue Type: Improvement
          Components: indexer
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma


At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.

The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.

Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Attachment:     (was: NUTCH-987-1.4.1-1.patch)

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-987.
-------------------------------


> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064553#comment-13064553 ] 

Markus Jelsma commented on NUTCH-987:
-------------------------------------

The previous patch has the config change for nutch-default, i missed it in the last patch.  Thanks Lewis and Julien!

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097931#comment-13097931 ] 

Markus Jelsma edited comment on NUTCH-987 at 9/6/11 11:59 AM:
--------------------------------------------------------------

Resolved for 1.4, see NUTCH-1104 for 2.0

      was (Author: markus17):
    Resolved for 1.4
  
> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Patch Info: [Patch Available]

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065792#comment-13065792 ] 

Julien Nioche commented on NUTCH-987:
-------------------------------------

Hi Markus, will this be committed to trunk as well?

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Attachment: SolrUtils.java
                NUTCH-987-1.4.1-1.patch

Patch for 1.4. Also moved UTF-8 strip method to Solr utils. Please comment.

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4.1-1.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065875#comment-13065875 ] 

Markus Jelsma commented on NUTCH-987:
-------------------------------------

Yes, at least partially. Solrclean isn't finished yet and dedup is broken. 

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-987:
---------------------------------------

    Attachment: NUTCH-987-2.0-2.patch

Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output
{code}
lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch
patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java
patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java
patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java
patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java
patching file src/java/org/apache/nutch/indexer/IndexerReducer.java
patching file conf/nutch-default.xml
Hunk #1 FAILED at 728.
Hunk #2 succeeded at 1060 (offset 13 lines).
1 out of 2 hunks FAILED -- saving rejects to file conf/nutch-default.xml.rej
{code}

I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output
{code}
[javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
    [javac]        solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]               ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf());
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
    [javac]     solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]            ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
    [javac]           val2 = SolrUtils.stripNonCharCodepoints((String)val);
    [javac]                  ^
    [javac] 6 errors

BUILD FAILED
/home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
{code}


> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063908#comment-13063908 ] 

Markus Jelsma commented on NUTCH-987:
-------------------------------------

Are there objections? Pointers? Comments?

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-987.
---------------------------------

    Resolution: Fixed

Resolved for 1.4

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-987:
---------------------------------------

    Comment: was deleted

(was: Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output
{code}
lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch
patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java
patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java
patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java
patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java
patching file src/java/org/apache/nutch/indexer/IndexerReducer.java
patching file conf/nutch-default.xml
Hunk #1 FAILED at 728.
Hunk #2 succeeded at 1060 (offset 13 lines).
1 out of 2 hunks FAILED -- saving rejects to file conf/nutch-default.xml.rej
{code}

I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output
{code}
[javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
    [javac]        solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]               ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf());
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
    [javac]     solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]            ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
    [javac]           val2 = SolrUtils.stripNonCharCodepoints((String)val);
    [javac]                  ^
    [javac] 6 errors

BUILD FAILED
/home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
{code}
)

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Attachment: NUTCH-987-2.0-1.patch

Partial patch for 2.0 includes support for HTTP auth for solrindex and solrdedup and includes NUTCH-1026 and NUTCH-1036.

Anyone who can test this would make me happy.

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-987:
---------------------------------------

    Attachment: NUTCH-987-2.0-2.patch

Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output
{code}
lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch
patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java
patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java
patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java
patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java
patching file src/java/org/apache/nutch/indexer/IndexerReducer.java
patching file conf/nutch-default.xml
Hunk #1 FAILED at 728.
Hunk #2 succeeded at 1060 (offset 13 lines).
1 out of 2 hunks FAILED -- saving rejects to file conf/nutch-default.xml.rej
{code}

I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output
{code}
[javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
    [javac]        solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]               ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob
    [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf());
    [javac]                         ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
    [javac]     solr = SolrUtils.getCommonsHttpSolrServer(conf);
    [javac]            ^
    [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol
    [javac] symbol  : variable SolrUtils
    [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
    [javac]           val2 = SolrUtils.stripNonCharCodepoints((String)val);
    [javac]                  ^
    [javac] 6 errors

BUILD FAILED
/home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
{code}

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064564#comment-13064564 ] 

Markus Jelsma commented on NUTCH-987:
-------------------------------------

Committed for 1.4 in rev. 1146035.

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Description: 
At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.

Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
* solr.auth=true
* solr.auth.username=USERNAME
* solr.auth.password=PASSWORD

  was:
At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.

The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.

Thoughts?


> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Attachment: NUTCH-987-1.3-hack.patch

Attached nasty hack for the sake of not losing it.

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>         Attachments: NUTCH-987-1.3-hack.patch
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Attachment: NUTCH-987-1.4-3.patch

If no objections i'll send this one in together with NUTCH-1036. This patch includes the changes made for NUTCH-1036, adding reporter increments here and there.

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Fix Version/s: 1.4

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Fix Version/s:     (was: 2.0)

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063334#comment-13063334 ] 

Markus Jelsma edited comment on NUTCH-987 at 7/11/11 1:35 PM:
--------------------------------------------------------------

Patch for 1.4. Also moved UTF-8 strip method to Solr utils. It's implemented using simple job properties, no fancy AuthScope stuff. Please comment.

      was (Author: markus17):
    Patch for 1.4. Also moved UTF-8 strip method to Solr utils. Please comment.
  
> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4.1-1.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Attachment: NUTCH-987-1.4.1-2.patch

Some instances in SolrDedup were missing. Also cleaned up some mess.

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084033#comment-13084033 ] 

Markus Jelsma commented on NUTCH-987:
-------------------------------------

Ah the config. You can easily add the config params yourself but they're not strictly required as the code already uses the same defaults.

I'm off!! Cheers!

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, NUTCH-987-2.0-1.patch, NUTCH-987-2.0-2.patch, NUTCH-987-2.0-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
> * solr.auth=true
> * solr.auth.username=USERNAME
> * solr.auth.password=PASSWORD

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064024#comment-13064024 ] 

Lewis John McGibbney commented on NUTCH-987:
--------------------------------------------

Based upon the current patch you provided, I think this is a good suggestion for inclusion. I am not currently using an auth protected Solr core in production, but will get authentication set up in development and get this tested Markus. It would make sense for inclusion just now as it will inevitably become a requested feature in the future.

Further to this, to address you initial question, I agree with the comments regarding the location to configure the auth credentials for Solr communication as quite simply I cannot think of any other solution which would do anything other than clutter.

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064541#comment-13064541 ] 

Julien Nioche commented on NUTCH-987:
-------------------------------------

don't forget to add the parameters you introduced to nutch-default.xml (with authentication off by default)
+1 otherwise

Thanks!

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch, NUTCH-987-1.4-3.patch, NUTCH-987-1.4.1-2.patch, SolrUtils.java
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-987:
--------------------------------

    Fix Version/s: 2.0

> Support HTTP auth for Solr communication
> ----------------------------------------
>
>                 Key: NUTCH-987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-987
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 2.0
>
>         Attachments: NUTCH-987-1.3-hack.patch
>
>
> At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.
> The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira