You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by th...@apache.org on 2017/02/25 14:34:51 UTC
[nutch] branch NUTCH-2292 updated (2175c76 -> 62491d5)
This is an automated email from the ASF dual-hosted git repository.
thammegowda pushed a change to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from 2175c76 Merge branch 'master' into NUTCH-2293
adds 0fff24a NUTCH-2287 Indexer-elastic plugin should use Elasticsearch BulkProcessor and BackoffPolicy
adds 9ce097b Merge branch 'NUTCH-2287' of https://github.com/naegelejd/nutch this closes #131
adds fda3e14 Revert botched commit of NUTCH-2267
adds 993e997 fix the cookie policy issue when the form authentication receives session cookie in a non-standard format - NUTCH-2280
adds 753cad0 Format the HttpFormAuthentication.java with eclipse format and add javadoc. Add the httpclient-auth.xml.template for cookie policy config example.
adds 9f32fe8 Merge branch 'NUTCH-2280' of https://github.com/stevegy/nutch this closes #134
adds d27c351 Fix for Nutch-2246: Refactor /seed end point, this closes #137
adds 070a637 Remove obsolete properties protocol.plugin.check.blocking and protocol.plugin.check.robots
adds d37b7ce Merge branch 'NUTCH-2299' of https://github.com/sebastian-nagel/nutch this closes #140 - Remove obsolete properties protocol.plugin.check.*
adds 6c9cca5 Allow Fetcher to optionally store robots.txt content (if property fetcher.store.robotstxt == true). Improved RobotRulesParser command-line tool.
adds 264eea0 Ignore robots.txt when parsing segment, refactored storing of robots.txt in FetcherThread
adds 33cdca7 add hint and log warning that fetcher.store.robotstxt works only in combination with fetcher.store.content
adds f3af9a5 simplified code: use diamond operator
adds 3fca1a5 NUTCH-2300 Fetcher to optionally save robots.txt Merge branch 'SaveRobotsTxt' of https://github.com/sebastian-nagel/nutch, this closes #141
adds 78e9909 Remove NUTCH-2246 from the 1.12 section of CHANGES.txt (fixed in 1.13)
adds 70622c3 NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' in crawl db / lastModified not always set - set modified time (time of last successful fetch) by DefaultFetchSchedule and AdaptiveFetchSchedule but only if the document is actually modified - update unit tests to check whether modification time is properly set - set modified time (sent by responding server in HTTP header) in ProtocolOutput: FetchSchedule implementations can access the HTTP modified time from [...]
adds e53b34b Fix for NUTCH-2132: Publisher/Subscriber model for Nutch to emit events, this closes #138
adds 836b2e0 NUTCH-2320 URLFilterChecker to run as TCP Telnet service
adds d4c924e revert 2320
adds 9092e23 NUTH-2329 Update Slf4j logging for Java 8 and upgrade miredot plugin version
adds 24cc2aa Fix for NUTCH-2327: Seeds injected in REST must be ingested into HDFS, this closes #155
adds 6e051f2 NUTCH-2336 SegmentReader to implement Tool (contributed by Vincent Slot), closes #159
adds f351790 NUTCH-2337 urlnormalizer-basic to strip empty port, closes #160 - make sure that URLs which contain anything else than the host in the authority (incl. empty port) are marked as changed - always use root locale for case conversion
adds 2b93a66 NUTCH-2352 Logging with generic class name, closes #172
adds 1a718e0 NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority - check whether URL.getAuthority() returns null - recompose URLs without authority with empty authority/host
adds 76aedcb NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority Merge branch 'NUTCH-2349-basic-url-normalizer-npe' of https://github.com/sebastian-nagel/nutch, this closes #169
adds 9a9c4b3 NUTCH-2359 Parsefilter-regex raises IndexOutOfBoundsException when rules are ill-formed
adds 217fad1 NUTCH-2355 Protocol plugins to set cookie if Cookie metadata field is present
adds c4b8955 NUTCH-2171 Nutch upgrade to Java 1.8
adds 3e2d3d4 Merge pull request #174 from kamaci/NUTCH-2171
new 98cd385 Merge branch 'master' into NUTCH-2292-1
new 9c25a8c Upstream changes, upgrade to JDK 8, add license header
new 62491d5 Merge with latest changes from master
The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.
Summary of changes:
.gitignore | 6 +-
build.xml | 3 +
conf/httpclient-auth.xml.template | 6 +
conf/nutch-default.xml | 75 +++++++++
default.properties | 4 +-
ivy/ivy.xml | 3 +
ivy/mvn.template | 10 +-
.../apache/nutch/crawl/AbstractFetchSchedule.java | 4 +-
.../apache/nutch/crawl/AdaptiveFetchSchedule.java | 7 +-
.../java/org/apache/nutch/crawl/CrawlDatum.java | 8 +-
.../main/java/org/apache/nutch/crawl/CrawlDb.java | 25 ++-
.../java/org/apache/nutch/crawl/CrawlDbFilter.java | 4 +-
.../java/org/apache/nutch/crawl/CrawlDbMerger.java | 5 +-
.../java/org/apache/nutch/crawl/CrawlDbReader.java | 18 +-
.../org/apache/nutch/crawl/CrawlDbReducer.java | 7 +-
.../org/apache/nutch/crawl/DeduplicationJob.java | 7 +-
.../apache/nutch/crawl/DefaultFetchSchedule.java | 4 +
.../apache/nutch/crawl/FetchScheduleFactory.java | 6 +-
.../java/org/apache/nutch/crawl/Generator.java | 10 +-
.../main/java/org/apache/nutch/crawl/Injector.java | 39 +++--
.../main/java/org/apache/nutch/crawl/Inlinks.java | 8 +-
.../main/java/org/apache/nutch/crawl/LinkDb.java | 25 ++-
.../java/org/apache/nutch/crawl/LinkDbFilter.java | 4 +-
.../java/org/apache/nutch/crawl/LinkDbMerger.java | 6 +-
.../java/org/apache/nutch/crawl/LinkDbReader.java | 6 +-
.../nutch/crawl/MimeAdaptiveFetchSchedule.java | 7 +-
.../org/apache/nutch/crawl/SignatureFactory.java | 4 +-
.../apache/nutch/crawl/TextProfileSignature.java | 6 +-
.../org/apache/nutch/crawl/URLPartitioner.java | 3 +-
.../java/org/apache/nutch/fetcher/FetchItem.java | 4 +-
.../org/apache/nutch/fetcher/FetchItemQueue.java | 6 +-
.../org/apache/nutch/fetcher/FetchItemQueues.java | 8 +-
.../java/org/apache/nutch/fetcher/FetchNodeDb.java | 2 +-
.../java/org/apache/nutch/fetcher/Fetcher.java | 29 ++--
.../org/apache/nutch/fetcher/FetcherThread.java | 76 ++++++++-
.../apache/nutch/fetcher/FetcherThreadEvent.java | 147 +++++++++++++++++
.../nutch/fetcher/FetcherThreadPublisher.java | 61 +++++++
.../java/org/apache/nutch/fetcher/QueueFeeder.java | 5 +-
.../java/org/apache/nutch/hostdb/ReadHostDb.java | 4 +-
.../org/apache/nutch/hostdb/ResolverThread.java | 4 +-
.../java/org/apache/nutch/hostdb/UpdateHostDb.java | 4 +-
.../apache/nutch/hostdb/UpdateHostDbMapper.java | 6 +-
.../apache/nutch/hostdb/UpdateHostDbReducer.java | 18 +-
.../java/org/apache/nutch/indexer/CleaningJob.java | 4 +-
.../org/apache/nutch/indexer/IndexWriters.java | 6 +-
.../org/apache/nutch/indexer/IndexerMapReduce.java | 5 +-
.../org/apache/nutch/indexer/IndexingFilters.java | 6 +-
.../nutch/indexer/IndexingFiltersChecker.java | 7 +-
.../java/org/apache/nutch/indexer/IndexingJob.java | 25 ++-
.../org/apache/nutch/indexer/NutchDocument.java | 2 +-
.../java/org/apache/nutch/indexer/NutchField.java | 4 +-
.../java/org/apache/nutch/metadata/Metadata.java | 2 +-
.../main/java/org/apache/nutch/metadata/Nutch.java | 13 ++
.../nutch/metadata/SpellCheckedMetadata.java | 2 +-
.../org/apache/nutch/net/URLExemptionFilters.java | 5 +-
.../java/org/apache/nutch/net/URLNormalizers.java | 13 +-
.../org/apache/nutch/parse/OutlinkExtractor.java | 5 +-
.../java/org/apache/nutch/parse/ParseData.java | 5 +-
.../org/apache/nutch/parse/ParseOutputFormat.java | 7 +-
.../org/apache/nutch/parse/ParsePluginList.java | 4 +-
.../org/apache/nutch/parse/ParsePluginsReader.java | 9 +-
.../java/org/apache/nutch/parse/ParseResult.java | 6 +-
.../java/org/apache/nutch/parse/ParseSegment.java | 30 ++--
.../java/org/apache/nutch/parse/ParseText.java | 5 +-
.../java/org/apache/nutch/parse/ParseUtil.java | 4 +-
.../java/org/apache/nutch/parse/ParserChecker.java | 6 +-
.../java/org/apache/nutch/parse/ParserFactory.java | 8 +-
.../java/org/apache/nutch/plugin/Extension.java | 2 +-
.../org/apache/nutch/plugin/ExtensionPoint.java | 2 +-
.../org/apache/nutch/plugin/PluginDescriptor.java | 21 +--
.../apache/nutch/plugin/PluginManifestParser.java | 2 +-
.../org/apache/nutch/plugin/PluginRepository.java | 35 ++--
.../java/org/apache/nutch/protocol/Content.java | 5 +-
.../java/org/apache/nutch/protocol/Protocol.java | 38 ++---
.../org/apache/nutch/protocol/ProtocolFactory.java | 5 +-
.../org/apache/nutch/protocol/ProtocolOutput.java | 14 ++
.../org/apache/nutch/protocol/ProtocolStatus.java | 2 +-
.../apache/nutch/protocol/RobotRulesParser.java | 183 ++++++++++++++++-----
.../NutchPublisher.java} | 41 ++---
.../apache/nutch/publisher/NutchPublishers.java | 83 ++++++++++
.../apache/nutch/scoring/webgraph/LinkDumper.java | 10 +-
.../apache/nutch/scoring/webgraph/LinkRank.java | 10 +-
.../apache/nutch/scoring/webgraph/NodeDumper.java | 4 +-
.../apache/nutch/scoring/webgraph/NodeReader.java | 2 +-
.../nutch/scoring/webgraph/ScoreUpdater.java | 4 +-
.../apache/nutch/scoring/webgraph/WebGraph.java | 12 +-
.../nutch/segment/ContentAsTextInputFormat.java | 2 +-
.../org/apache/nutch/segment/SegmentChecker.java | 5 +-
.../apache/nutch/segment/SegmentMergeFilters.java | 3 +-
.../org/apache/nutch/segment/SegmentMerger.java | 13 +-
.../org/apache/nutch/segment/SegmentReader.java | 66 +++++---
.../java/org/apache/nutch/service/NutchReader.java | 6 +-
.../java/org/apache/nutch/service/NutchServer.java | 15 +-
.../SeedManager.java} | 22 +--
.../org/apache/nutch/service/impl/JobWorker.java | 4 +-
.../org/apache/nutch/service/impl/LinkReader.java | 8 +-
.../org/apache/nutch/service/impl/NodeReader.java | 8 +-
.../DbQuery.java => impl/SeedManagerImpl.java} | 60 +++----
.../apache/nutch/service/impl/SequenceReader.java | 12 +-
.../nutch/service/model/request/DbQuery.java | 2 +-
.../nutch/service/model/request/SeedList.java | 10 ++
.../service/model/response/FetchNodeDbInfo.java | 2 +-
.../nutch/service/resources/AdminResource.java | 3 +-
.../apache/nutch/service/resources/DbResource.java | 2 +-
.../nutch/service/resources/SeedResource.java | 105 ++++++------
.../nutch/tools/AbstractCommonCrawlFormat.java | 4 +-
.../java/org/apache/nutch/tools/Benchmark.java | 8 +-
.../apache/nutch/tools/CommonCrawlDataDumper.java | 7 +-
.../nutch/tools/CommonCrawlFormatJettinson.java | 4 +-
.../java/org/apache/nutch/tools/DmozParser.java | 26 +--
.../java/org/apache/nutch/tools/FileDumper.java | 40 ++---
.../java/org/apache/nutch/tools/FreeGenerator.java | 5 +-
.../java/org/apache/nutch/tools/ResolveUrls.java | 4 +-
.../apache/nutch/tools/arc/ArcRecordReader.java | 5 +-
.../apache/nutch/tools/arc/ArcSegmentCreator.java | 5 +-
.../org/apache/nutch/tools/warc/WARCExporter.java | 6 +-
.../apache/nutch/util/CrawlCompletionStats.java | 3 +-
.../java/org/apache/nutch/util/DeflateUtils.java | 4 +-
.../main/java/org/apache/nutch/util/DomUtil.java | 4 +-
.../java/org/apache/nutch/util/DumpFileUtil.java | 5 +-
.../org/apache/nutch/util/EncodingDetector.java | 15 +-
.../main/java/org/apache/nutch/util/GZIPUtils.java | 4 +-
.../java/org/apache/nutch/util/HadoopFSUtil.java | 19 +--
.../main/java/org/apache/nutch/util/JexlUtil.java | 4 +-
.../main/java/org/apache/nutch/util/MimeUtil.java | 10 +-
.../java/org/apache/nutch/util/NodeWalker.java | 2 +-
.../main/java/org/apache/nutch/util/NutchTool.java | 2 +-
.../java/org/apache/nutch/util/ObjectCache.java | 8 +-
.../nutch/util/ProtocolStatusStatistics.java | 3 +-
.../org/apache/nutch/util/TrieStringMatcher.java | 4 +-
.../apache/nutch/util/domain/DomainStatistics.java | 3 +-
.../apache/nutch/util/domain/DomainSuffixes.java | 5 +-
.../nutch/util/domain/DomainSuffixesReader.java | 3 +-
.../nutch/webui/client/impl/CrawlingCycle.java | 6 +-
.../webui/client/impl/RemoteCommandExecutor.java | 6 +-
.../webui/pages/components/ColorEnumLabel.java | 2 +-
.../pages/components/ColorEnumLabelBuilder.java | 2 +-
.../webui/pages/components/CpmIteratorAdapter.java | 2 +-
.../nutch/webui/pages/crawls/CrawlPanel.java | 8 +-
.../nutch/webui/pages/crawls/CrawlsPage.java | 4 +-
.../nutch/webui/pages/instances/InstancePanel.java | 2 +-
.../nutch/webui/pages/instances/InstancesPage.java | 4 +-
.../nutch/webui/pages/seed/SeedListsPage.java | 4 +-
.../apache/nutch/webui/pages/seed/SeedPage.java | 6 +-
.../nutch/webui/pages/settings/SettingsPage.java | 4 +-
.../nutch/webui/service/impl/CrawlServiceImpl.java | 6 +-
.../nutch/webui/service/impl/NutchServiceImpl.java | 7 +-
.../nutch/crawl/ContinuousCrawlTestUtil.java | 3 +-
.../org/apache/nutch/crawl/CrawlDBTestUtil.java | 3 +-
.../nutch/crawl/CrawlDbUpdateTestDriver.java | 3 +-
.../org/apache/nutch/crawl/CrawlDbUpdateUtil.java | 3 +-
.../apache/nutch/crawl/TODOTestCrawlDbStates.java | 4 +-
.../org/apache/nutch/crawl/TestCrawlDbMerger.java | 18 +-
.../org/apache/nutch/crawl/TestCrawlDbStates.java | 18 +-
.../org/apache/nutch/crawl/TestLinkDbMerger.java | 18 +-
.../apache/nutch/indexer/TestIndexerMapReduce.java | 4 +-
.../segment/TestSegmentMergerCrawlDatums.java | 3 +-
.../org/apache/nutch/service/TestNutchServer.java | 4 +-
.../apache/nutch/tools/proxy/LogDebugHandler.java | 3 +-
.../org/apache/nutch/tools/proxy/ProxyTestbed.java | 4 +-
.../apache/nutch/tools/proxy/SegmentHandler.java | 3 +-
nutch-plugins/build.xml | 1 +
.../creativecommons/nutch/CCIndexingFilter.java | 5 +-
.../org/creativecommons/nutch/CCParseFilter.java | 4 +-
.../org/apache/nutch/parse/feed/FeedParser.java | 4 +-
.../apache/nutch/parse/feed/TestFeedParser.java | 5 +-
.../nutch/indexer/anchor/AnchorIndexingFilter.java | 5 +-
.../nutch/indexer/basic/BasicIndexingFilter.java | 5 +-
.../nutch/indexer/geoip/GeoIPIndexingFilter.java | 3 +-
.../nutch/indexer/links/LinksIndexingFilter.java | 7 +-
.../nutch/indexer/more/MoreIndexingFilter.java | 5 +-
.../cloudsearch/CloudSearchIndexWriter.java | 5 +-
.../nutch/indexwriter/dummy/DummyIndexWriter.java | 5 +-
.../indexwriter/elastic/ElasticIndexWriter.java | 4 +-
.../elastic/TestElasticIndexWriter.java | 0
.../src/test/resources}/nutch-site-test.xml | 0
.../nutch/indexwriter/solr/SolrIndexWriter.java | 5 +-
.../nutch/indexwriter/solr/SolrMappingReader.java | 4 +-
.../apache/nutch/indexwriter/solr/SolrUtils.java | 14 +-
.../nutch/analysis/lang/HTMLLanguageParser.java | 5 +-
.../nutch/protocol/htmlunit/HtmlUnitWebDriver.java | 4 +-
.../apache/nutch/protocol/http/api/HttpBase.java | 71 ++++----
.../protocol/http/api/HttpRobotRulesParser.java | 57 ++++++-
.../nutch/urlfilter/api/RegexURLFilterBase.java | 5 +-
.../urlfilter/api/RegexURLFilterBaseTest.java | 5 +-
.../nutch/protocol/selenium/HttpWebClient.java | 4 +-
.../nutch/microformats/reltag/RelTagParser.java | 4 +-
.../indexer/filter/MimeTypeIndexingFilter.java | 3 +-
nutch-plugins/nutch-extensionpoints/plugin.xml | 4 +
.../java/org/apache/nutch/parse/ext/ExtParser.java | 5 +-
.../org/apache/nutch/parse/html/HtmlParser.java | 5 +-
.../apache/nutch/parse/html/TestHtmlParser.java | 5 +-
.../org/apache/nutch/parse/js/JSParseFilter.java | 4 +-
.../java/org/apache/nutch/parse/swf/SWFParser.java | 5 +-
.../org/apache/nutch/parse/tika/TikaParser.java | 4 +-
.../java/org/apache/nutch/tika/TestFeedParser.java | 6 +-
.../java/org/apache/nutch/parse/zip/ZipParser.java | 4 +-
.../apache/nutch/parse/zip/ZipTextExtractor.java | 5 +-
.../naivebayes/NaiveBayesParseFilter.java | 3 +-
nutch-plugins/parsefilter-regex/README.txt | 41 +++++
.../nutch/parsefilter/regex/RegexParseFilter.java | 24 ++-
nutch-plugins/pom.xml | 1 +
.../java/org/apache/nutch/protocol/file/File.java | 17 +-
.../java/org/apache/nutch/protocol/ftp/Ftp.java | 13 +-
.../nutch/protocol/ftp/FtpRobotRulesParser.java | 22 ++-
.../org/apache/nutch/protocol/htmlunit/Http.java | 4 +-
.../java/org/apache/nutch/protocol/http/Http.java | 4 +-
.../apache/nutch/protocol/http/HttpResponse.java | 11 +-
nutch-plugins/protocol-httpclient/pom.xml | 20 ++-
.../httpclient/DummySSLProtocolSocketFactory.java | 3 +-
.../protocol/httpclient/DummyX509TrustManager.java | 3 +-
.../org/apache/nutch/protocol/httpclient/Http.java | 83 ++++++----
.../httpclient/HttpAuthenticationFactory.java | 5 +-
.../httpclient/HttpBasicAuthentication.java | 5 +-
.../httpclient/HttpFormAuthConfigurer.java | 21 ++-
.../httpclient/HttpFormAuthentication.java | 85 +++++++---
.../nutch/protocol/httpclient/HttpResponse.java | 7 +
.../nutch/protocol/interactiveselenium/Http.java | 4 +-
.../handlers/DefalultMultiInteractionHandler.java | 4 +-
.../handlers/DefaultClickAllAjaxLinksHandler.java | 3 +-
.../org/apache/nutch/protocol/selenium/Http.java | 4 +-
.../build-ivy.xml | 2 +-
.../{scoring-opic => publish-rabbitmq}/build.xml | 2 +-
.../{creativecommons => publish-rabbitmq}/ivy.xml | 1 +
.../{headings => publish-rabbitmq}/plugin.xml | 20 +--
.../{parse-html => publish-rabbitmq}/pom.xml | 19 ++-
.../publisher/rabbitmq/RabbitMQPublisherImpl.java | 95 +++++++++++
.../nutch/publisher/rabbitmq}/package-info.java | 4 +-
.../nutch/scoring/opic/OPICScoringFilter.java | 5 +-
.../similarity/cosine/CosineSimilarity.java | 7 +-
.../nutch/scoring/similarity/cosine/Model.java | 6 +-
.../apache/nutch/collection/CollectionManager.java | 4 +-
.../subcollection/SubcollectionIndexingFilter.java | 6 +-
.../nutch/indexer/tld/TLDIndexingFilter.java | 5 +-
.../nutch/urlfilter/domain/DomainURLFilter.java | 3 +-
.../domainblacklist/DomainBlacklistURLFilter.java | 3 +-
.../urlfilter/ignoreexempt/ExemptionUrlFilter.java | 5 +-
.../nutch/urlfilter/prefix/PrefixURLFilter.java | 3 +-
.../nutch/urlfilter/suffix/SuffixURLFilter.java | 3 +-
.../indexer/urlmeta/URLMetaIndexingFilter.java | 4 +-
.../scoring/urlmeta/URLMetaScoringFilter.java | 3 +-
.../net/urlnormalizer/ajax/AjaxURLNormalizer.java | 4 +-
.../urlnormalizer/basic/BasicURLNormalizer.java | 21 ++-
.../basic/TestBasicURLNormalizer.java | 11 +-
.../net/urlnormalizer/host/HostURLNormalizer.java | 3 +-
.../protocol/ProtocolURLNormalizer.java | 4 +-
.../querystring/QuerystringURLNormalizer.java | 3 +-
.../urlnormalizer/regex/RegexURLNormalizer.java | 3 +-
.../regex/TestRegexURLNormalizer.java | 3 +-
.../urlnormalizer/slash/SlashURLNormalizer.java | 4 +-
pom.xml | 20 ++-
251 files changed, 2090 insertions(+), 925 deletions(-)
create mode 100644 nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
create mode 100644 nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
copy nutch-core/src/main/java/org/apache/nutch/{parse/HtmlParseFilter.java => publisher/NutchPublisher.java} (54%)
create mode 100644 nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublishers.java
copy nutch-core/src/main/java/org/apache/nutch/{webui/service/NutchInstanceService.java => service/SeedManager.java} (69%)
copy nutch-core/src/main/java/org/apache/nutch/service/{model/request/DbQuery.java => impl/SeedManagerImpl.java} (52%)
rename {src/plugin/indexer-elastic/src/test => nutch-plugins/indexer-elastic/src/test/java}/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java (100%)
rename {src/plugin/indexer-elastic/src/test/conf => nutch-plugins/indexer-elastic/src/test/resources}/nutch-site-test.xml (100%)
create mode 100644 nutch-plugins/parsefilter-regex/README.txt
copy nutch-plugins/{index-geoip => publish-rabbitmq}/build-ivy.xml (96%)
copy nutch-plugins/{scoring-opic => publish-rabbitmq}/build.xml (95%)
copy nutch-plugins/{creativecommons => publish-rabbitmq}/ivy.xml (93%)
copy nutch-plugins/{headings => publish-rabbitmq}/plugin.xml (72%)
copy nutch-plugins/{parse-html => publish-rabbitmq}/pom.xml (76%)
create mode 100644 nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
copy nutch-plugins/{parse-swf/src/main/java/org/apache/nutch/parse/swf => publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq}/package-info.java (90%)
--
To stop receiving notification emails like this one, please contact
['"commits@nutch.apache.org" <co...@nutch.apache.org>'].
[nutch] 03/03: Merge with latest changes from master
Posted by th...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
thammegowda pushed a commit to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git
commit 62491d5b0ac3349d684a493c9bd121442849ee8c
Author: Thamme Gowda <th...@apache.org>
AuthorDate: Sat Feb 25 06:26:39 2017 -0800
Merge with latest changes from master
---
.../elastic/TestElasticIndexWriter.java | 0
.../src/test/resources}/nutch-site-test.xml | 0
.../parsefilter-regex/README.txt | 0
nutch-plugins/pom.xml | 1 +
nutch-plugins/protocol-httpclient/pom.xml | 20 ++++++++++++--
.../publish-rabbitmq/build-ivy.xml | 0
.../publish-rabbitmq/build.xml | 0
.../publish-rabbitmq/ivy.xml | 0
.../publish-rabbitmq/plugin.xml | 0
.../pom.xml | 32 ++++++++--------------
.../publisher/rabbitmq/RabbitMQPublisherImpl.java | 0
.../nutch/publisher/rabbitmq/package-info.java | 0
12 files changed, 29 insertions(+), 24 deletions(-)
diff --git a/src/plugin/indexer-elastic/src/test/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java b/nutch-plugins/indexer-elastic/src/test/java/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java
similarity index 100%
rename from src/plugin/indexer-elastic/src/test/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java
rename to nutch-plugins/indexer-elastic/src/test/java/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java
diff --git a/src/plugin/indexer-elastic/src/test/conf/nutch-site-test.xml b/nutch-plugins/indexer-elastic/src/test/resources/nutch-site-test.xml
similarity index 100%
rename from src/plugin/indexer-elastic/src/test/conf/nutch-site-test.xml
rename to nutch-plugins/indexer-elastic/src/test/resources/nutch-site-test.xml
diff --git a/src/plugin/parsefilter-regex/README.txt b/nutch-plugins/parsefilter-regex/README.txt
similarity index 100%
rename from src/plugin/parsefilter-regex/README.txt
rename to nutch-plugins/parsefilter-regex/README.txt
diff --git a/nutch-plugins/pom.xml b/nutch-plugins/pom.xml
index e07f487..0fc29e1 100644
--- a/nutch-plugins/pom.xml
+++ b/nutch-plugins/pom.xml
@@ -76,6 +76,7 @@
<module>protocol-httpclient</module>
<module>protocol-interactiveselenium</module>
<module>protocol-selenium</module>
+ <module>publish-rabbitmq</module>
<module>scoring-depth</module>
<module>scoring-link</module>
<module>scoring-opic</module>
diff --git a/nutch-plugins/protocol-httpclient/pom.xml b/nutch-plugins/protocol-httpclient/pom.xml
index 2f2fc7c..4fdac6c 100644
--- a/nutch-plugins/protocol-httpclient/pom.xml
+++ b/nutch-plugins/protocol-httpclient/pom.xml
@@ -33,12 +33,26 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+ <commons.lang3.version>3.5</commons.lang3.version>
+ <jsoup.version>1.8.1</jsoup.version>
+ <jetty.version>6.1.26</jetty.version>
+ <jsp.version>6.1.14</jsp.version>
</properties>
<dependencies>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
- <version>1.8.1</version>
+ <version>${jsoup.version}</version>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.commons</groupId>
+ <artifactId>commons-lang3</artifactId>
+ <version>${commons.lang3.version}</version>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.nutch</groupId>
+ <artifactId>lib-http</artifactId>
+ <version>${project.parent.version}</version>
</dependency>
<dependency>
<groupId>org.apache.nutch</groupId>
@@ -48,13 +62,13 @@
<dependency>
<groupId> org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
- <version>6.1.26</version>
+ <version>${jetty.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId> org.mortbay.jetty</groupId>
<artifactId>jsp-2.1</artifactId>
- <version>6.1.14</version>
+ <version>${jsp.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
diff --git a/src/plugin/publish-rabbitmq/build-ivy.xml b/nutch-plugins/publish-rabbitmq/build-ivy.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/build-ivy.xml
rename to nutch-plugins/publish-rabbitmq/build-ivy.xml
diff --git a/src/plugin/publish-rabbitmq/build.xml b/nutch-plugins/publish-rabbitmq/build.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/build.xml
rename to nutch-plugins/publish-rabbitmq/build.xml
diff --git a/src/plugin/publish-rabbitmq/ivy.xml b/nutch-plugins/publish-rabbitmq/ivy.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/ivy.xml
rename to nutch-plugins/publish-rabbitmq/ivy.xml
diff --git a/src/plugin/publish-rabbitmq/plugin.xml b/nutch-plugins/publish-rabbitmq/plugin.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/plugin.xml
rename to nutch-plugins/publish-rabbitmq/plugin.xml
diff --git a/nutch-plugins/protocol-httpclient/pom.xml b/nutch-plugins/publish-rabbitmq/pom.xml
similarity index 67%
copy from nutch-plugins/protocol-httpclient/pom.xml
copy to nutch-plugins/publish-rabbitmq/pom.xml
index 2f2fc7c..a8a434d 100644
--- a/nutch-plugins/protocol-httpclient/pom.xml
+++ b/nutch-plugins/publish-rabbitmq/pom.xml
@@ -25,38 +25,28 @@
<version>1.13-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
- <artifactId>protocol-httpclient</artifactId>
+ <artifactId>publish-rabitmq</artifactId>
<packaging>jar</packaging>
- <name>protocol-httpclient</name>
+ <name>publish-rabitmq</name>
<url>http://nutch.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+ <rabitmq.version>3.6.5</rabitmq.version>
+ <jackson.version>2.8.6</jackson.version>
</properties>
+
<dependencies>
<dependency>
- <groupId>org.jsoup</groupId>
- <artifactId>jsoup</artifactId>
- <version>1.8.1</version>
- </dependency>
- <dependency>
- <groupId>org.apache.nutch</groupId>
- <artifactId>lib-http</artifactId>
- <version>${project.parent.version}</version>
+ <groupId>com.fasterxml.jackson.core</groupId>
+ <artifactId>jackson-databind</artifactId>
+ <version>${jackson.version}</version>
</dependency>
<dependency>
- <groupId> org.mortbay.jetty</groupId>
- <artifactId>jetty</artifactId>
- <version>6.1.26</version>
- <scope>test</scope>
- </dependency>
- <dependency>
- <groupId> org.mortbay.jetty</groupId>
- <artifactId>jsp-2.1</artifactId>
- <version>6.1.14</version>
- <scope>test</scope>
+ <groupId>com.rabbitmq</groupId>
+ <artifactId>amqp-client</artifactId>
+ <version>${rabitmq.version}</version>
</dependency>
</dependencies>
-
</project>
diff --git a/src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java b/nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
similarity index 100%
rename from src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
rename to nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
diff --git a/src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/package-info.java b/nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/package-info.java
similarity index 100%
rename from src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/package-info.java
rename to nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/package-info.java
--
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.
[nutch] 01/03: Merge branch 'master' into NUTCH-2292-1
Posted by th...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
thammegowda pushed a commit to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git
commit 98cd385b35bbd6b5b3a110b745f0eddd238b9456
Merge: 2175c76 3e2d3d4
Author: Thamme Gowda <th...@apache.org>
AuthorDate: Fri Feb 24 11:36:40 2017 -0800
Merge branch 'master' into NUTCH-2292-1
build.xml | 3 +
conf/httpclient-auth.xml.template | 6 +
conf/nutch-default.xml | 75 +++++++++
default.properties | 4 +-
ivy/ivy.xml | 3 +
ivy/mvn.template | 10 +-
.../apache/nutch/crawl/AbstractFetchSchedule.java | 4 +-
.../apache/nutch/crawl/AdaptiveFetchSchedule.java | 7 +-
.../java/org/apache/nutch/crawl/CrawlDatum.java | 8 +-
.../main/java/org/apache/nutch/crawl/CrawlDb.java | 25 ++-
.../java/org/apache/nutch/crawl/CrawlDbFilter.java | 4 +-
.../java/org/apache/nutch/crawl/CrawlDbMerger.java | 5 +-
.../java/org/apache/nutch/crawl/CrawlDbReader.java | 18 +-
.../org/apache/nutch/crawl/CrawlDbReducer.java | 7 +-
.../org/apache/nutch/crawl/DeduplicationJob.java | 7 +-
.../apache/nutch/crawl/DefaultFetchSchedule.java | 4 +
.../apache/nutch/crawl/FetchScheduleFactory.java | 6 +-
.../java/org/apache/nutch/crawl/Generator.java | 10 +-
.../main/java/org/apache/nutch/crawl/Injector.java | 39 +++--
.../main/java/org/apache/nutch/crawl/Inlinks.java | 8 +-
.../main/java/org/apache/nutch/crawl/LinkDb.java | 25 ++-
.../java/org/apache/nutch/crawl/LinkDbFilter.java | 4 +-
.../java/org/apache/nutch/crawl/LinkDbMerger.java | 6 +-
.../java/org/apache/nutch/crawl/LinkDbReader.java | 6 +-
.../nutch/crawl/MimeAdaptiveFetchSchedule.java | 7 +-
.../org/apache/nutch/crawl/SignatureFactory.java | 4 +-
.../apache/nutch/crawl/TextProfileSignature.java | 6 +-
.../org/apache/nutch/crawl/URLPartitioner.java | 3 +-
.../java/org/apache/nutch/fetcher/FetchItem.java | 4 +-
.../org/apache/nutch/fetcher/FetchItemQueue.java | 6 +-
.../org/apache/nutch/fetcher/FetchItemQueues.java | 8 +-
.../java/org/apache/nutch/fetcher/FetchNodeDb.java | 2 +-
.../java/org/apache/nutch/fetcher/Fetcher.java | 29 ++--
.../org/apache/nutch/fetcher/FetcherThread.java | 76 ++++++++-
.../java/org/apache/nutch/fetcher/QueueFeeder.java | 5 +-
.../java/org/apache/nutch/hostdb/ReadHostDb.java | 4 +-
.../org/apache/nutch/hostdb/ResolverThread.java | 4 +-
.../java/org/apache/nutch/hostdb/UpdateHostDb.java | 4 +-
.../apache/nutch/hostdb/UpdateHostDbMapper.java | 6 +-
.../apache/nutch/hostdb/UpdateHostDbReducer.java | 18 +-
.../java/org/apache/nutch/indexer/CleaningJob.java | 4 +-
.../org/apache/nutch/indexer/IndexWriters.java | 6 +-
.../org/apache/nutch/indexer/IndexerMapReduce.java | 5 +-
.../org/apache/nutch/indexer/IndexingFilters.java | 6 +-
.../nutch/indexer/IndexingFiltersChecker.java | 7 +-
.../java/org/apache/nutch/indexer/IndexingJob.java | 25 ++-
.../org/apache/nutch/indexer/NutchDocument.java | 2 +-
.../java/org/apache/nutch/indexer/NutchField.java | 4 +-
.../java/org/apache/nutch/metadata/Metadata.java | 2 +-
.../main/java/org/apache/nutch/metadata/Nutch.java | 13 ++
.../nutch/metadata/SpellCheckedMetadata.java | 2 +-
.../org/apache/nutch/net/URLExemptionFilters.java | 5 +-
.../java/org/apache/nutch/net/URLNormalizers.java | 13 +-
.../org/apache/nutch/parse/OutlinkExtractor.java | 5 +-
.../java/org/apache/nutch/parse/ParseData.java | 5 +-
.../org/apache/nutch/parse/ParseOutputFormat.java | 7 +-
.../org/apache/nutch/parse/ParsePluginList.java | 4 +-
.../org/apache/nutch/parse/ParsePluginsReader.java | 9 +-
.../java/org/apache/nutch/parse/ParseResult.java | 6 +-
.../java/org/apache/nutch/parse/ParseSegment.java | 30 ++--
.../java/org/apache/nutch/parse/ParseText.java | 5 +-
.../java/org/apache/nutch/parse/ParseUtil.java | 4 +-
.../java/org/apache/nutch/parse/ParserChecker.java | 6 +-
.../java/org/apache/nutch/parse/ParserFactory.java | 8 +-
.../java/org/apache/nutch/plugin/Extension.java | 2 +-
.../org/apache/nutch/plugin/ExtensionPoint.java | 2 +-
.../org/apache/nutch/plugin/PluginDescriptor.java | 21 +--
.../apache/nutch/plugin/PluginManifestParser.java | 2 +-
.../org/apache/nutch/plugin/PluginRepository.java | 35 ++--
.../java/org/apache/nutch/protocol/Content.java | 5 +-
.../java/org/apache/nutch/protocol/Protocol.java | 38 ++---
.../org/apache/nutch/protocol/ProtocolFactory.java | 5 +-
.../org/apache/nutch/protocol/ProtocolOutput.java | 14 ++
.../org/apache/nutch/protocol/ProtocolStatus.java | 2 +-
.../apache/nutch/protocol/RobotRulesParser.java | 183 ++++++++++++++++-----
.../apache/nutch/scoring/webgraph/LinkDumper.java | 10 +-
.../apache/nutch/scoring/webgraph/LinkRank.java | 10 +-
.../apache/nutch/scoring/webgraph/NodeDumper.java | 4 +-
.../apache/nutch/scoring/webgraph/NodeReader.java | 2 +-
.../nutch/scoring/webgraph/ScoreUpdater.java | 4 +-
.../apache/nutch/scoring/webgraph/WebGraph.java | 12 +-
.../nutch/segment/ContentAsTextInputFormat.java | 2 +-
.../org/apache/nutch/segment/SegmentChecker.java | 5 +-
.../apache/nutch/segment/SegmentMergeFilters.java | 3 +-
.../org/apache/nutch/segment/SegmentMerger.java | 13 +-
.../org/apache/nutch/segment/SegmentReader.java | 66 +++++---
.../java/org/apache/nutch/service/NutchReader.java | 6 +-
.../java/org/apache/nutch/service/NutchServer.java | 15 +-
.../org/apache/nutch/service/impl/JobWorker.java | 4 +-
.../org/apache/nutch/service/impl/LinkReader.java | 8 +-
.../org/apache/nutch/service/impl/NodeReader.java | 8 +-
.../apache/nutch/service/impl/SequenceReader.java | 12 +-
.../nutch/service/model/request/DbQuery.java | 2 +-
.../nutch/service/model/request/SeedList.java | 10 ++
.../service/model/response/FetchNodeDbInfo.java | 2 +-
.../nutch/service/resources/AdminResource.java | 3 +-
.../apache/nutch/service/resources/DbResource.java | 2 +-
.../nutch/service/resources/SeedResource.java | 105 ++++++------
.../nutch/tools/AbstractCommonCrawlFormat.java | 4 +-
.../java/org/apache/nutch/tools/Benchmark.java | 8 +-
.../apache/nutch/tools/CommonCrawlDataDumper.java | 7 +-
.../nutch/tools/CommonCrawlFormatJettinson.java | 4 +-
.../java/org/apache/nutch/tools/DmozParser.java | 26 +--
.../java/org/apache/nutch/tools/FileDumper.java | 40 ++---
.../java/org/apache/nutch/tools/FreeGenerator.java | 5 +-
.../java/org/apache/nutch/tools/ResolveUrls.java | 4 +-
.../apache/nutch/tools/arc/ArcRecordReader.java | 5 +-
.../apache/nutch/tools/arc/ArcSegmentCreator.java | 5 +-
.../org/apache/nutch/tools/warc/WARCExporter.java | 6 +-
.../apache/nutch/util/CrawlCompletionStats.java | 3 +-
.../java/org/apache/nutch/util/DeflateUtils.java | 4 +-
.../main/java/org/apache/nutch/util/DomUtil.java | 4 +-
.../java/org/apache/nutch/util/DumpFileUtil.java | 5 +-
.../org/apache/nutch/util/EncodingDetector.java | 15 +-
.../main/java/org/apache/nutch/util/GZIPUtils.java | 4 +-
.../java/org/apache/nutch/util/HadoopFSUtil.java | 19 +--
.../main/java/org/apache/nutch/util/JexlUtil.java | 4 +-
.../main/java/org/apache/nutch/util/MimeUtil.java | 10 +-
.../java/org/apache/nutch/util/NodeWalker.java | 2 +-
.../main/java/org/apache/nutch/util/NutchTool.java | 2 +-
.../java/org/apache/nutch/util/ObjectCache.java | 8 +-
.../nutch/util/ProtocolStatusStatistics.java | 3 +-
.../org/apache/nutch/util/TrieStringMatcher.java | 4 +-
.../apache/nutch/util/domain/DomainStatistics.java | 3 +-
.../apache/nutch/util/domain/DomainSuffixes.java | 5 +-
.../nutch/util/domain/DomainSuffixesReader.java | 3 +-
.../nutch/webui/client/impl/CrawlingCycle.java | 6 +-
.../webui/client/impl/RemoteCommandExecutor.java | 6 +-
.../webui/pages/components/ColorEnumLabel.java | 2 +-
.../pages/components/ColorEnumLabelBuilder.java | 2 +-
.../webui/pages/components/CpmIteratorAdapter.java | 2 +-
.../nutch/webui/pages/crawls/CrawlPanel.java | 8 +-
.../nutch/webui/pages/crawls/CrawlsPage.java | 4 +-
.../nutch/webui/pages/instances/InstancePanel.java | 2 +-
.../nutch/webui/pages/instances/InstancesPage.java | 4 +-
.../nutch/webui/pages/seed/SeedListsPage.java | 4 +-
.../apache/nutch/webui/pages/seed/SeedPage.java | 6 +-
.../nutch/webui/pages/settings/SettingsPage.java | 4 +-
.../nutch/webui/service/impl/CrawlServiceImpl.java | 6 +-
.../nutch/webui/service/impl/NutchServiceImpl.java | 7 +-
.../nutch/crawl/ContinuousCrawlTestUtil.java | 3 +-
.../org/apache/nutch/crawl/CrawlDBTestUtil.java | 3 +-
.../nutch/crawl/CrawlDbUpdateTestDriver.java | 3 +-
.../org/apache/nutch/crawl/CrawlDbUpdateUtil.java | 3 +-
.../apache/nutch/crawl/TODOTestCrawlDbStates.java | 4 +-
.../org/apache/nutch/crawl/TestCrawlDbMerger.java | 18 +-
.../org/apache/nutch/crawl/TestCrawlDbStates.java | 18 +-
.../org/apache/nutch/crawl/TestLinkDbMerger.java | 18 +-
.../apache/nutch/indexer/TestIndexerMapReduce.java | 4 +-
.../segment/TestSegmentMergerCrawlDatums.java | 3 +-
.../org/apache/nutch/service/TestNutchServer.java | 4 +-
.../apache/nutch/tools/proxy/LogDebugHandler.java | 3 +-
.../org/apache/nutch/tools/proxy/ProxyTestbed.java | 4 +-
.../apache/nutch/tools/proxy/SegmentHandler.java | 3 +-
nutch-plugins/build.xml | 1 +
.../creativecommons/nutch/CCIndexingFilter.java | 5 +-
.../org/creativecommons/nutch/CCParseFilter.java | 4 +-
.../org/apache/nutch/parse/feed/FeedParser.java | 4 +-
.../apache/nutch/parse/feed/TestFeedParser.java | 5 +-
.../nutch/indexer/anchor/AnchorIndexingFilter.java | 5 +-
.../nutch/indexer/basic/BasicIndexingFilter.java | 5 +-
.../nutch/indexer/geoip/GeoIPIndexingFilter.java | 3 +-
.../nutch/indexer/links/LinksIndexingFilter.java | 7 +-
.../nutch/indexer/more/MoreIndexingFilter.java | 5 +-
.../cloudsearch/CloudSearchIndexWriter.java | 5 +-
.../nutch/indexwriter/dummy/DummyIndexWriter.java | 5 +-
.../indexwriter/elastic/ElasticIndexWriter.java | 4 +-
.../nutch/indexwriter/solr/SolrIndexWriter.java | 5 +-
.../nutch/indexwriter/solr/SolrMappingReader.java | 4 +-
.../apache/nutch/indexwriter/solr/SolrUtils.java | 14 +-
.../nutch/analysis/lang/HTMLLanguageParser.java | 5 +-
.../nutch/protocol/htmlunit/HtmlUnitWebDriver.java | 4 +-
.../apache/nutch/protocol/http/api/HttpBase.java | 71 ++++----
.../protocol/http/api/HttpRobotRulesParser.java | 57 ++++++-
.../nutch/urlfilter/api/RegexURLFilterBase.java | 5 +-
.../urlfilter/api/RegexURLFilterBaseTest.java | 5 +-
.../nutch/protocol/selenium/HttpWebClient.java | 4 +-
.../nutch/microformats/reltag/RelTagParser.java | 4 +-
.../indexer/filter/MimeTypeIndexingFilter.java | 3 +-
nutch-plugins/nutch-extensionpoints/plugin.xml | 4 +
.../java/org/apache/nutch/parse/ext/ExtParser.java | 5 +-
.../org/apache/nutch/parse/html/HtmlParser.java | 5 +-
.../apache/nutch/parse/html/TestHtmlParser.java | 5 +-
.../org/apache/nutch/parse/js/JSParseFilter.java | 4 +-
.../java/org/apache/nutch/parse/swf/SWFParser.java | 5 +-
.../org/apache/nutch/parse/tika/TikaParser.java | 4 +-
.../java/org/apache/nutch/tika/TestFeedParser.java | 6 +-
.../java/org/apache/nutch/parse/zip/ZipParser.java | 4 +-
.../apache/nutch/parse/zip/ZipTextExtractor.java | 5 +-
.../naivebayes/NaiveBayesParseFilter.java | 3 +-
.../nutch/parsefilter/regex/RegexParseFilter.java | 24 ++-
.../java/org/apache/nutch/protocol/file/File.java | 17 +-
.../java/org/apache/nutch/protocol/ftp/Ftp.java | 13 +-
.../nutch/protocol/ftp/FtpRobotRulesParser.java | 22 ++-
.../org/apache/nutch/protocol/htmlunit/Http.java | 4 +-
.../java/org/apache/nutch/protocol/http/Http.java | 4 +-
.../apache/nutch/protocol/http/HttpResponse.java | 11 +-
.../httpclient/DummySSLProtocolSocketFactory.java | 3 +-
.../protocol/httpclient/DummyX509TrustManager.java | 3 +-
.../org/apache/nutch/protocol/httpclient/Http.java | 83 ++++++----
.../httpclient/HttpAuthenticationFactory.java | 5 +-
.../httpclient/HttpBasicAuthentication.java | 5 +-
.../httpclient/HttpFormAuthConfigurer.java | 21 ++-
.../httpclient/HttpFormAuthentication.java | 85 +++++++---
.../nutch/protocol/httpclient/HttpResponse.java | 7 +
.../nutch/protocol/interactiveselenium/Http.java | 4 +-
.../handlers/DefalultMultiInteractionHandler.java | 4 +-
.../handlers/DefaultClickAllAjaxLinksHandler.java | 3 +-
.../org/apache/nutch/protocol/selenium/Http.java | 4 +-
.../nutch/scoring/opic/OPICScoringFilter.java | 5 +-
.../similarity/cosine/CosineSimilarity.java | 7 +-
.../nutch/scoring/similarity/cosine/Model.java | 6 +-
.../apache/nutch/collection/CollectionManager.java | 4 +-
.../subcollection/SubcollectionIndexingFilter.java | 6 +-
.../nutch/indexer/tld/TLDIndexingFilter.java | 5 +-
.../nutch/urlfilter/domain/DomainURLFilter.java | 3 +-
.../domainblacklist/DomainBlacklistURLFilter.java | 3 +-
.../urlfilter/ignoreexempt/ExemptionUrlFilter.java | 5 +-
.../nutch/urlfilter/prefix/PrefixURLFilter.java | 3 +-
.../nutch/urlfilter/suffix/SuffixURLFilter.java | 3 +-
.../indexer/urlmeta/URLMetaIndexingFilter.java | 4 +-
.../scoring/urlmeta/URLMetaScoringFilter.java | 3 +-
.../net/urlnormalizer/ajax/AjaxURLNormalizer.java | 4 +-
.../urlnormalizer/basic/BasicURLNormalizer.java | 21 ++-
.../basic/TestBasicURLNormalizer.java | 11 +-
.../net/urlnormalizer/host/HostURLNormalizer.java | 3 +-
.../protocol/ProtocolURLNormalizer.java | 4 +-
.../querystring/QuerystringURLNormalizer.java | 3 +-
.../urlnormalizer/regex/RegexURLNormalizer.java | 3 +-
.../regex/TestRegexURLNormalizer.java | 3 +-
.../urlnormalizer/slash/SlashURLNormalizer.java | 4 +-
.../apache/nutch/fetcher/FetcherThreadEvent.java | 147 +++++++++++++++++
.../nutch/fetcher/FetcherThreadPublisher.java | 61 +++++++
.../org/apache/nutch/publisher/NutchPublisher.java | 47 +++---
.../apache/nutch/publisher/NutchPublishers.java | 83 ++++++++++
.../java/org/apache/nutch/service/SeedManager.java | 34 ++--
.../apache/nutch/service/impl/SeedManagerImpl.java | 60 +++----
src/plugin/parsefilter-regex/README.txt | 41 +++++
src/plugin/publish-rabbitmq/build-ivy.xml | 54 ++++++
src/plugin/publish-rabbitmq/build.xml | 27 +++
src/plugin/publish-rabbitmq/ivy.xml | 42 +++++
src/plugin/publish-rabbitmq/plugin.xml | 43 +++++
.../publisher/rabbitmq/RabbitMQPublisherImpl.java | 95 +++++++++++
.../nutch/publisher/rabbitmq/package-info.java | 25 +--
244 files changed, 2202 insertions(+), 927 deletions(-)
diff --cc nutch-core/src/test/java/org/apache/nutch/crawl/TestCrawlDbMerger.java
index 599c353,7c4b2eb..bfb1581
--- a/nutch-core/src/test/java/org/apache/nutch/crawl/TestCrawlDbMerger.java
+++ b/nutch-core/src/test/java/org/apache/nutch/crawl/TestCrawlDbMerger.java
@@@ -36,11 -37,10 +38,11 @@@ import org.junit.After
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
+import org.junit.experimental.categories.Category;
public class TestCrawlDbMerger {
- private static final Logger LOG = Logger.getLogger(CrawlDbMerger.class
- .getName());
+ private static final Logger LOG = LoggerFactory
+ .getLogger(MethodHandles.lookup().lookupClass());
String url10 = "http://example.com/";
String url11 = "http://example.com/foo";
--
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.
[nutch] 02/03: Upstream changes, upgrade to JDK 8,
add license header
Posted by th...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
thammegowda pushed a commit to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git
commit 9c25a8c73c71cadb906624587ce14f4587b4b153
Author: Thamme Gowda <th...@apache.org>
AuthorDate: Fri Feb 24 11:54:34 2017 -0800
Upstream changes, upgrade to JDK 8, add license header
---
.gitignore | 6 +++++-
.../org/apache/nutch/fetcher/FetcherThreadEvent.java | 0
.../apache/nutch/fetcher/FetcherThreadPublisher.java | 0
.../org/apache/nutch/publisher/NutchPublisher.java | 0
.../org/apache/nutch/publisher/NutchPublishers.java | 0
.../java/org/apache/nutch/service/SeedManager.java | 0
.../apache/nutch/service/impl/SeedManagerImpl.java | 0
pom.xml | 20 ++++++++++++++++++--
8 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/.gitignore b/.gitignore
index 7a70f9d..e0cfd33 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,4 +11,8 @@ logs/
target/
nutch-core/target
nutch-plugins/target
-nutch-plugins/*/target
\ No newline at end of file
+nutch-plugins/*/target
+
+# IntelliJ Idea
+.idea
+**.iml
\ No newline at end of file
diff --git a/src/java/org/apache/nutch/fetcher/FetcherThreadEvent.java b/nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
similarity index 100%
rename from src/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
rename to nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
diff --git a/src/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java b/nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
similarity index 100%
rename from src/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
rename to nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
diff --git a/src/java/org/apache/nutch/publisher/NutchPublisher.java b/nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublisher.java
similarity index 100%
rename from src/java/org/apache/nutch/publisher/NutchPublisher.java
rename to nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublisher.java
diff --git a/src/java/org/apache/nutch/publisher/NutchPublishers.java b/nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublishers.java
similarity index 100%
rename from src/java/org/apache/nutch/publisher/NutchPublishers.java
rename to nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublishers.java
diff --git a/src/java/org/apache/nutch/service/SeedManager.java b/nutch-core/src/main/java/org/apache/nutch/service/SeedManager.java
similarity index 100%
rename from src/java/org/apache/nutch/service/SeedManager.java
rename to nutch-core/src/main/java/org/apache/nutch/service/SeedManager.java
diff --git a/src/java/org/apache/nutch/service/impl/SeedManagerImpl.java b/nutch-core/src/main/java/org/apache/nutch/service/impl/SeedManagerImpl.java
similarity index 100%
rename from src/java/org/apache/nutch/service/impl/SeedManagerImpl.java
rename to nutch-core/src/main/java/org/apache/nutch/service/impl/SeedManagerImpl.java
diff --git a/pom.xml b/pom.xml
index a3b9271..ff2147a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1,4 +1,20 @@
<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
@@ -26,8 +42,8 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
- <source>1.7</source>
- <target>1.7</target>
+ <source>1.8</source>
+ <target>1.8</target>
</configuration>
</plugin>
<plugin>
--
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.