You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Pascal Dimassimo (JIRA)" <ji...@apache.org> on 2010/04/27 18:03:32 UTC

[jira] Created: (NUTCH-815) Invalid blank line before If-Modified-Since HTTP header

Invalid blank line before If-Modified-Since HTTP header
-------------------------------------------------------

                 Key: NUTCH-815
                 URL: https://issues.apache.org/jira/browse/NUTCH-815
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.0.0
         Environment: Nutch 1.0.0, Windows XP, Java 1.6.0_17
            Reporter: Pascal Dimassimo


If there is a Modified time stored in the crawldb for a link, the class org.apache.nutch.protocol.http.HttpResponse will use it as the value for the If-Modified-Since header. 

Line 131:
reqStr.append("\r\n");
if (datum.getModifiedTime() > 0) {
        reqStr.append("If-Modified-Since: " + HttpDateFormat.toString(datum.getModifiedTime()));
        reqStr.append("\r\n");
}

The problem is that an extra blank line is insert before this header. This make the header invalid:
----------------------------------------------------------------------------------
GET /tinysite/second.html HTTP/1.0
Host: localhost:8080
Accept-Encoding: x-gzip, gzip, deflate
User-Agent: nutch/Nutch-1.0
Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3

If-Modified-Since: Tue, 27 Apr 2010 13:51:50 GMT
----------------------------------------------------------------------------------

I'm using the AdaptiveFetchSchedule to set the Modified time in the crawldb. 

I've made a test by moving the line 131 after the if block and it works. I think this is where that line should go.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-815) Invalid blank line before If-Modified-Since HTTP header

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861470#action_12861470 ] 

Andrzej Bialecki  commented on NUTCH-815:
-----------------------------------------

Good catch. I'll fix it shortly.

> Invalid blank line before If-Modified-Since HTTP header
> -------------------------------------------------------
>
>                 Key: NUTCH-815
>                 URL: https://issues.apache.org/jira/browse/NUTCH-815
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>         Environment: Nutch 1.0.0, Windows XP, Java 1.6.0_17
>            Reporter: Pascal Dimassimo
>
> If there is a Modified time stored in the crawldb for a link, the class org.apache.nutch.protocol.http.HttpResponse will use it as the value for the If-Modified-Since header. 
> Line 131:
> reqStr.append("\r\n");
> if (datum.getModifiedTime() > 0) {
>         reqStr.append("If-Modified-Since: " + HttpDateFormat.toString(datum.getModifiedTime()));
>         reqStr.append("\r\n");
> }
> The problem is that an extra blank line is insert before this header. This make the header invalid:
> ----------------------------------------------------------------------------------
> GET /tinysite/second.html HTTP/1.0
> Host: localhost:8080
> Accept-Encoding: x-gzip, gzip, deflate
> User-Agent: nutch/Nutch-1.0
> Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3
> If-Modified-Since: Tue, 27 Apr 2010 13:51:50 GMT
> ----------------------------------------------------------------------------------
> I'm using the AdaptiveFetchSchedule to set the Modified time in the crawldb. 
> I've made a test by moving the line 131 after the if block and it works. I think this is where that line should go.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-815) Invalid blank line before If-Modified-Since HTTP header

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  closed NUTCH-815.
-----------------------------------

         Assignee: Andrzej Bialecki 
    Fix Version/s: 1.1
       Resolution: Fixed

Fixed in rev. 938586. Thanks!

> Invalid blank line before If-Modified-Since HTTP header
> -------------------------------------------------------
>
>                 Key: NUTCH-815
>                 URL: https://issues.apache.org/jira/browse/NUTCH-815
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>         Environment: Nutch 1.0.0, Windows XP, Java 1.6.0_17
>            Reporter: Pascal Dimassimo
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.1
>
>
> If there is a Modified time stored in the crawldb for a link, the class org.apache.nutch.protocol.http.HttpResponse will use it as the value for the If-Modified-Since header. 
> Line 131:
> reqStr.append("\r\n");
> if (datum.getModifiedTime() > 0) {
>         reqStr.append("If-Modified-Since: " + HttpDateFormat.toString(datum.getModifiedTime()));
>         reqStr.append("\r\n");
> }
> The problem is that an extra blank line is insert before this header. This make the header invalid:
> ----------------------------------------------------------------------------------
> GET /tinysite/second.html HTTP/1.0
> Host: localhost:8080
> Accept-Encoding: x-gzip, gzip, deflate
> User-Agent: nutch/Nutch-1.0
> Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3
> If-Modified-Since: Tue, 27 Apr 2010 13:51:50 GMT
> ----------------------------------------------------------------------------------
> I'm using the AdaptiveFetchSchedule to set the Modified time in the crawldb. 
> I've made a test by moving the line 131 after the if block and it works. I think this is where that line should go.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.