You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by GitBox <gi...@apache.org> on 2022/06/16 02:03:08 UTC

[GitHub] [nutch] lewismc commented on pull request #733: NUTCH-2936 / NUTCH-2949 URLStreamHandler may fail jobs in distributed mode

lewismc commented on PR #733:
URL: https://github.com/apache/nutch/pull/733#issuecomment-1157149313

   This is exciting!!! Excellent debugging 👍 ... you got further than me.
   I can't get around to testing it until next week at earliest. 
   Thinking back, I did observe revisits (recursive access) to URLStreamHandlerFactory but didn't pursue that line of inquiry at that point in time.
   To get a bit more context I did review [HADOOP-14598-005.patch](https://issues.apache.org/jira/secure/attachment/12880380/HADOOP-14598-005.patch) and the current class it affects. Reading the code it makes more sense but admittedly until I debug this I still don't have the full context.
   I took a look at [hadoop-hdfs TestUrlStreamHandler.java](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestUrlStreamHandler.java) as well which I really like the look of. To build out some more confidence in this aspect of the codebase, we could create some tests for the [nutch URLStreamHandlerFactory.java](https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org