You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2019/03/13 15:42:00 UTC

[jira] [Created] (NUTCH-2699) Protocol-okhttp: needless loops to increment requested bytes counter when more content is already buffered

Sebastian Nagel created NUTCH-2699:
--------------------------------------

             Summary: Protocol-okhttp: needless loops to increment requested bytes counter when more content is already buffered
                 Key: NUTCH-2699
                 URL: https://issues.apache.org/jira/browse/NUTCH-2699
             Project: Nutch
          Issue Type: Bug
          Components: protocol
    Affects Versions: 1.15
            Reporter: Sebastian Nagel
             Fix For: 1.16


The okhttp library used by the plugin protocol-okhttp buffers content internal and often has already buffered more content than has been requested. The plugin should immediately set the request count to the size of the buffered content to avoid needless loops when the buffered size comes close to the content limit (the increment steps are too small):
{noformat}
2019-03-11 14:56:36,642 DEBUG okhttp.OkHttpResponse - http://localhost/large.pdf - http/1.1 200 OK
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 8192, buffered = 16088
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 16384, buffered = 24280
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 24576, buffered = 32472
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 32768, buffered = 40664
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 40960, buffered = 48856
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 49152, buffered = 57048
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 57344, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 57638, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 57932, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 58226, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 58520, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 58814, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59108, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59402, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59696, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59990, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 60284, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 60578, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 60872, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 61166, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 61460, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 61754, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62048, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62342, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62636, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62930, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 63224, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 63518, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 63812, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64106, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64400, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64694, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64988, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 65282, buffered = 73432
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - content limit reached
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - copied 65534 bytes out of 73432 buffered, remaining buffer contains 7898 bytes
2019-03-11 14:56:36,645 DEBUG okhttp.OkHttpResponse - HTTP content truncated to 65534 bytes (reason: LENGTH)
2019-03-11 14:56:36,661 INFO parse.ParseSegment - http://localhost/large.pdf skipped. Content of size 366578 was truncated to 65534
2019-03-11 14:56:36,661 WARN parse.ParserChecker - Content is truncated, parse may fail!
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)