You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/11/01 16:37:00 UTC

[jira] [Commented] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

    [ https://issues.apache.org/jira/browse/NUTCH-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781820#comment-17781820 ] 

ASF GitHub Bot commented on NUTCH-3020:
---------------------------------------

tballison opened a new pull request, #794:
URL: https://github.com/apache/nutch/pull/794

   Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)! Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Nutch issue tracker](https://issues.apache.org/jira/projects/NUTCH) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes.
   * the issue ID (`NUTCH-XXXX`)
     - is referenced in the title of the pull request
     - and placed in front of your commit messages surrounded by square brackets (`[NUTCH-XXXX] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Java source code follows [Nutch Eclipse Code Formatting rules](https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml)
   * Nutch is successfully built and unit tests pass by running `ant clean runtime test`
   * there should be no conflicts when merging the pull request branch into the *recent* master branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled master branch.
   * if new dependencies are added,
     - are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](https://www.apache.org/legal/resolved.html#category-a)?
     - are `LICENSE-binary` and `NOTICE-binary` updated accordingly?
   
   We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Nutch in general, please sign up for the [Nutch mailing list](https://nutch.apache.org/mailing_lists.html). Thanks!
   




> ParseSegment should check for protocol's flags for truncation
> -------------------------------------------------------------
>
>                 Key: NUTCH-3020
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3020
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> As discussed on the user list, several protocols can identify when a fetch has been truncated. ParseSegment only checks for the number of bytes fetched vs the http length header (if it exists). We should modify ParseSegment to check for notification of truncation from the protocols.
> I noticed this specifically with okhttp, but other protocols may flag truncation as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)