You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by th...@apache.org on 2017/02/25 14:34:51 UTC

[nutch] branch NUTCH-2292 updated (2175c76 -> 62491d5)

This is an automated email from the ASF dual-hosted git repository.

thammegowda pushed a change to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git.

      from  2175c76   Merge branch 'master' into NUTCH-2293
      adds  0fff24a   NUTCH-2287 Indexer-elastic plugin should use Elasticsearch BulkProcessor and BackoffPolicy
      adds  9ce097b   Merge branch 'NUTCH-2287' of https://github.com/naegelejd/nutch this closes #131
      adds  fda3e14   Revert botched commit of NUTCH-2267
      adds  993e997   fix the cookie policy issue when the form authentication receives session cookie in a non-standard format - NUTCH-2280
      adds  753cad0   Format the HttpFormAuthentication.java with eclipse format and add javadoc. Add the httpclient-auth.xml.template for cookie policy config example.
      adds  9f32fe8   Merge branch 'NUTCH-2280' of https://github.com/stevegy/nutch this closes #134
      adds  d27c351   Fix for Nutch-2246: Refactor /seed end point, this closes #137
      adds  070a637   Remove obsolete properties protocol.plugin.check.blocking and protocol.plugin.check.robots
      adds  d37b7ce   Merge branch 'NUTCH-2299' of https://github.com/sebastian-nagel/nutch this closes #140 - Remove obsolete properties protocol.plugin.check.*
      adds  6c9cca5   Allow Fetcher to optionally store robots.txt content (if property fetcher.store.robotstxt == true). Improved RobotRulesParser command-line tool.
      adds  264eea0   Ignore robots.txt when parsing segment, refactored storing of robots.txt in FetcherThread
      adds  33cdca7   add hint and log warning that fetcher.store.robotstxt works only in combination with fetcher.store.content
      adds  f3af9a5   simplified code: use diamond operator
      adds  3fca1a5   NUTCH-2300 Fetcher to optionally save robots.txt Merge branch 'SaveRobotsTxt' of https://github.com/sebastian-nagel/nutch, this closes #141
      adds  78e9909   Remove NUTCH-2246 from the 1.12 section of CHANGES.txt (fixed in 1.13)
      adds  70622c3   NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' in crawl db / lastModified not always set  - set modified time (time of last successful fetch) by DefaultFetchSchedule and AdaptiveFetchSchedule    but only if the document is actually modified  - update unit tests to check whether modification time is properly set  - set modified time (sent by responding server in HTTP header) in ProtocolOutput:    FetchSchedule implementations can access the HTTP modified time from [...]
      adds  e53b34b   Fix for NUTCH-2132: Publisher/Subscriber model for Nutch to emit events, this closes #138
      adds  836b2e0   NUTCH-2320 URLFilterChecker to run as TCP Telnet service
      adds  d4c924e   revert 2320
      adds  9092e23   NUTH-2329 Update Slf4j logging for Java 8 and upgrade miredot plugin version
      adds  24cc2aa   Fix for NUTCH-2327: Seeds injected in REST must be ingested into HDFS, this closes #155
      adds  6e051f2   NUTCH-2336 SegmentReader to implement Tool (contributed by Vincent Slot), closes #159
      adds  f351790   NUTCH-2337 urlnormalizer-basic to strip empty port, closes #160 - make sure that URLs which contain anything else than the host   in the authority (incl. empty port) are marked as changed - always use root locale for case conversion
      adds  2b93a66   NUTCH-2352 Logging with generic class name, closes #172
      adds  1a718e0   NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority - check whether URL.getAuthority() returns null - recompose URLs without authority with empty authority/host
      adds  76aedcb   NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority Merge branch 'NUTCH-2349-basic-url-normalizer-npe' of https://github.com/sebastian-nagel/nutch, this closes #169
      adds  9a9c4b3   NUTCH-2359 Parsefilter-regex raises IndexOutOfBoundsException when rules are ill-formed
      adds  217fad1   NUTCH-2355 Protocol plugins to set cookie if Cookie metadata field is present
      adds  c4b8955   NUTCH-2171 Nutch upgrade to Java 1.8
      adds  3e2d3d4   Merge pull request #174 from kamaci/NUTCH-2171
       new  98cd385   Merge branch 'master' into NUTCH-2292-1
       new  9c25a8c   Upstream changes, upgrade to JDK 8, add license header
       new  62491d5   Merge with latest changes from master

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitignore                                         |   6 +-
 build.xml                                          |   3 +
 conf/httpclient-auth.xml.template                  |   6 +
 conf/nutch-default.xml                             |  75 +++++++++
 default.properties                                 |   4 +-
 ivy/ivy.xml                                        |   3 +
 ivy/mvn.template                                   |  10 +-
 .../apache/nutch/crawl/AbstractFetchSchedule.java  |   4 +-
 .../apache/nutch/crawl/AdaptiveFetchSchedule.java  |   7 +-
 .../java/org/apache/nutch/crawl/CrawlDatum.java    |   8 +-
 .../main/java/org/apache/nutch/crawl/CrawlDb.java  |  25 ++-
 .../java/org/apache/nutch/crawl/CrawlDbFilter.java |   4 +-
 .../java/org/apache/nutch/crawl/CrawlDbMerger.java |   5 +-
 .../java/org/apache/nutch/crawl/CrawlDbReader.java |  18 +-
 .../org/apache/nutch/crawl/CrawlDbReducer.java     |   7 +-
 .../org/apache/nutch/crawl/DeduplicationJob.java   |   7 +-
 .../apache/nutch/crawl/DefaultFetchSchedule.java   |   4 +
 .../apache/nutch/crawl/FetchScheduleFactory.java   |   6 +-
 .../java/org/apache/nutch/crawl/Generator.java     |  10 +-
 .../main/java/org/apache/nutch/crawl/Injector.java |  39 +++--
 .../main/java/org/apache/nutch/crawl/Inlinks.java  |   8 +-
 .../main/java/org/apache/nutch/crawl/LinkDb.java   |  25 ++-
 .../java/org/apache/nutch/crawl/LinkDbFilter.java  |   4 +-
 .../java/org/apache/nutch/crawl/LinkDbMerger.java  |   6 +-
 .../java/org/apache/nutch/crawl/LinkDbReader.java  |   6 +-
 .../nutch/crawl/MimeAdaptiveFetchSchedule.java     |   7 +-
 .../org/apache/nutch/crawl/SignatureFactory.java   |   4 +-
 .../apache/nutch/crawl/TextProfileSignature.java   |   6 +-
 .../org/apache/nutch/crawl/URLPartitioner.java     |   3 +-
 .../java/org/apache/nutch/fetcher/FetchItem.java   |   4 +-
 .../org/apache/nutch/fetcher/FetchItemQueue.java   |   6 +-
 .../org/apache/nutch/fetcher/FetchItemQueues.java  |   8 +-
 .../java/org/apache/nutch/fetcher/FetchNodeDb.java |   2 +-
 .../java/org/apache/nutch/fetcher/Fetcher.java     |  29 ++--
 .../org/apache/nutch/fetcher/FetcherThread.java    |  76 ++++++++-
 .../apache/nutch/fetcher/FetcherThreadEvent.java   | 147 +++++++++++++++++
 .../nutch/fetcher/FetcherThreadPublisher.java      |  61 +++++++
 .../java/org/apache/nutch/fetcher/QueueFeeder.java |   5 +-
 .../java/org/apache/nutch/hostdb/ReadHostDb.java   |   4 +-
 .../org/apache/nutch/hostdb/ResolverThread.java    |   4 +-
 .../java/org/apache/nutch/hostdb/UpdateHostDb.java |   4 +-
 .../apache/nutch/hostdb/UpdateHostDbMapper.java    |   6 +-
 .../apache/nutch/hostdb/UpdateHostDbReducer.java   |  18 +-
 .../java/org/apache/nutch/indexer/CleaningJob.java |   4 +-
 .../org/apache/nutch/indexer/IndexWriters.java     |   6 +-
 .../org/apache/nutch/indexer/IndexerMapReduce.java |   5 +-
 .../org/apache/nutch/indexer/IndexingFilters.java  |   6 +-
 .../nutch/indexer/IndexingFiltersChecker.java      |   7 +-
 .../java/org/apache/nutch/indexer/IndexingJob.java |  25 ++-
 .../org/apache/nutch/indexer/NutchDocument.java    |   2 +-
 .../java/org/apache/nutch/indexer/NutchField.java  |   4 +-
 .../java/org/apache/nutch/metadata/Metadata.java   |   2 +-
 .../main/java/org/apache/nutch/metadata/Nutch.java |  13 ++
 .../nutch/metadata/SpellCheckedMetadata.java       |   2 +-
 .../org/apache/nutch/net/URLExemptionFilters.java  |   5 +-
 .../java/org/apache/nutch/net/URLNormalizers.java  |  13 +-
 .../org/apache/nutch/parse/OutlinkExtractor.java   |   5 +-
 .../java/org/apache/nutch/parse/ParseData.java     |   5 +-
 .../org/apache/nutch/parse/ParseOutputFormat.java  |   7 +-
 .../org/apache/nutch/parse/ParsePluginList.java    |   4 +-
 .../org/apache/nutch/parse/ParsePluginsReader.java |   9 +-
 .../java/org/apache/nutch/parse/ParseResult.java   |   6 +-
 .../java/org/apache/nutch/parse/ParseSegment.java  |  30 ++--
 .../java/org/apache/nutch/parse/ParseText.java     |   5 +-
 .../java/org/apache/nutch/parse/ParseUtil.java     |   4 +-
 .../java/org/apache/nutch/parse/ParserChecker.java |   6 +-
 .../java/org/apache/nutch/parse/ParserFactory.java |   8 +-
 .../java/org/apache/nutch/plugin/Extension.java    |   2 +-
 .../org/apache/nutch/plugin/ExtensionPoint.java    |   2 +-
 .../org/apache/nutch/plugin/PluginDescriptor.java  |  21 +--
 .../apache/nutch/plugin/PluginManifestParser.java  |   2 +-
 .../org/apache/nutch/plugin/PluginRepository.java  |  35 ++--
 .../java/org/apache/nutch/protocol/Content.java    |   5 +-
 .../java/org/apache/nutch/protocol/Protocol.java   |  38 ++---
 .../org/apache/nutch/protocol/ProtocolFactory.java |   5 +-
 .../org/apache/nutch/protocol/ProtocolOutput.java  |  14 ++
 .../org/apache/nutch/protocol/ProtocolStatus.java  |   2 +-
 .../apache/nutch/protocol/RobotRulesParser.java    | 183 ++++++++++++++++-----
 .../NutchPublisher.java}                           |  41 ++---
 .../apache/nutch/publisher/NutchPublishers.java    |  83 ++++++++++
 .../apache/nutch/scoring/webgraph/LinkDumper.java  |  10 +-
 .../apache/nutch/scoring/webgraph/LinkRank.java    |  10 +-
 .../apache/nutch/scoring/webgraph/NodeDumper.java  |   4 +-
 .../apache/nutch/scoring/webgraph/NodeReader.java  |   2 +-
 .../nutch/scoring/webgraph/ScoreUpdater.java       |   4 +-
 .../apache/nutch/scoring/webgraph/WebGraph.java    |  12 +-
 .../nutch/segment/ContentAsTextInputFormat.java    |   2 +-
 .../org/apache/nutch/segment/SegmentChecker.java   |   5 +-
 .../apache/nutch/segment/SegmentMergeFilters.java  |   3 +-
 .../org/apache/nutch/segment/SegmentMerger.java    |  13 +-
 .../org/apache/nutch/segment/SegmentReader.java    |  66 +++++---
 .../java/org/apache/nutch/service/NutchReader.java |   6 +-
 .../java/org/apache/nutch/service/NutchServer.java |  15 +-
 .../SeedManager.java}                              |  22 +--
 .../org/apache/nutch/service/impl/JobWorker.java   |   4 +-
 .../org/apache/nutch/service/impl/LinkReader.java  |   8 +-
 .../org/apache/nutch/service/impl/NodeReader.java  |   8 +-
 .../DbQuery.java => impl/SeedManagerImpl.java}     |  60 +++----
 .../apache/nutch/service/impl/SequenceReader.java  |  12 +-
 .../nutch/service/model/request/DbQuery.java       |   2 +-
 .../nutch/service/model/request/SeedList.java      |  10 ++
 .../service/model/response/FetchNodeDbInfo.java    |   2 +-
 .../nutch/service/resources/AdminResource.java     |   3 +-
 .../apache/nutch/service/resources/DbResource.java |   2 +-
 .../nutch/service/resources/SeedResource.java      | 105 ++++++------
 .../nutch/tools/AbstractCommonCrawlFormat.java     |   4 +-
 .../java/org/apache/nutch/tools/Benchmark.java     |   8 +-
 .../apache/nutch/tools/CommonCrawlDataDumper.java  |   7 +-
 .../nutch/tools/CommonCrawlFormatJettinson.java    |   4 +-
 .../java/org/apache/nutch/tools/DmozParser.java    |  26 +--
 .../java/org/apache/nutch/tools/FileDumper.java    |  40 ++---
 .../java/org/apache/nutch/tools/FreeGenerator.java |   5 +-
 .../java/org/apache/nutch/tools/ResolveUrls.java   |   4 +-
 .../apache/nutch/tools/arc/ArcRecordReader.java    |   5 +-
 .../apache/nutch/tools/arc/ArcSegmentCreator.java  |   5 +-
 .../org/apache/nutch/tools/warc/WARCExporter.java  |   6 +-
 .../apache/nutch/util/CrawlCompletionStats.java    |   3 +-
 .../java/org/apache/nutch/util/DeflateUtils.java   |   4 +-
 .../main/java/org/apache/nutch/util/DomUtil.java   |   4 +-
 .../java/org/apache/nutch/util/DumpFileUtil.java   |   5 +-
 .../org/apache/nutch/util/EncodingDetector.java    |  15 +-
 .../main/java/org/apache/nutch/util/GZIPUtils.java |   4 +-
 .../java/org/apache/nutch/util/HadoopFSUtil.java   |  19 +--
 .../main/java/org/apache/nutch/util/JexlUtil.java  |   4 +-
 .../main/java/org/apache/nutch/util/MimeUtil.java  |  10 +-
 .../java/org/apache/nutch/util/NodeWalker.java     |   2 +-
 .../main/java/org/apache/nutch/util/NutchTool.java |   2 +-
 .../java/org/apache/nutch/util/ObjectCache.java    |   8 +-
 .../nutch/util/ProtocolStatusStatistics.java       |   3 +-
 .../org/apache/nutch/util/TrieStringMatcher.java   |   4 +-
 .../apache/nutch/util/domain/DomainStatistics.java |   3 +-
 .../apache/nutch/util/domain/DomainSuffixes.java   |   5 +-
 .../nutch/util/domain/DomainSuffixesReader.java    |   3 +-
 .../nutch/webui/client/impl/CrawlingCycle.java     |   6 +-
 .../webui/client/impl/RemoteCommandExecutor.java   |   6 +-
 .../webui/pages/components/ColorEnumLabel.java     |   2 +-
 .../pages/components/ColorEnumLabelBuilder.java    |   2 +-
 .../webui/pages/components/CpmIteratorAdapter.java |   2 +-
 .../nutch/webui/pages/crawls/CrawlPanel.java       |   8 +-
 .../nutch/webui/pages/crawls/CrawlsPage.java       |   4 +-
 .../nutch/webui/pages/instances/InstancePanel.java |   2 +-
 .../nutch/webui/pages/instances/InstancesPage.java |   4 +-
 .../nutch/webui/pages/seed/SeedListsPage.java      |   4 +-
 .../apache/nutch/webui/pages/seed/SeedPage.java    |   6 +-
 .../nutch/webui/pages/settings/SettingsPage.java   |   4 +-
 .../nutch/webui/service/impl/CrawlServiceImpl.java |   6 +-
 .../nutch/webui/service/impl/NutchServiceImpl.java |   7 +-
 .../nutch/crawl/ContinuousCrawlTestUtil.java       |   3 +-
 .../org/apache/nutch/crawl/CrawlDBTestUtil.java    |   3 +-
 .../nutch/crawl/CrawlDbUpdateTestDriver.java       |   3 +-
 .../org/apache/nutch/crawl/CrawlDbUpdateUtil.java  |   3 +-
 .../apache/nutch/crawl/TODOTestCrawlDbStates.java  |   4 +-
 .../org/apache/nutch/crawl/TestCrawlDbMerger.java  |  18 +-
 .../org/apache/nutch/crawl/TestCrawlDbStates.java  |  18 +-
 .../org/apache/nutch/crawl/TestLinkDbMerger.java   |  18 +-
 .../apache/nutch/indexer/TestIndexerMapReduce.java |   4 +-
 .../segment/TestSegmentMergerCrawlDatums.java      |   3 +-
 .../org/apache/nutch/service/TestNutchServer.java  |   4 +-
 .../apache/nutch/tools/proxy/LogDebugHandler.java  |   3 +-
 .../org/apache/nutch/tools/proxy/ProxyTestbed.java |   4 +-
 .../apache/nutch/tools/proxy/SegmentHandler.java   |   3 +-
 nutch-plugins/build.xml                            |   1 +
 .../creativecommons/nutch/CCIndexingFilter.java    |   5 +-
 .../org/creativecommons/nutch/CCParseFilter.java   |   4 +-
 .../org/apache/nutch/parse/feed/FeedParser.java    |   4 +-
 .../apache/nutch/parse/feed/TestFeedParser.java    |   5 +-
 .../nutch/indexer/anchor/AnchorIndexingFilter.java |   5 +-
 .../nutch/indexer/basic/BasicIndexingFilter.java   |   5 +-
 .../nutch/indexer/geoip/GeoIPIndexingFilter.java   |   3 +-
 .../nutch/indexer/links/LinksIndexingFilter.java   |   7 +-
 .../nutch/indexer/more/MoreIndexingFilter.java     |   5 +-
 .../cloudsearch/CloudSearchIndexWriter.java        |   5 +-
 .../nutch/indexwriter/dummy/DummyIndexWriter.java  |   5 +-
 .../indexwriter/elastic/ElasticIndexWriter.java    |   4 +-
 .../elastic/TestElasticIndexWriter.java            |   0
 .../src/test/resources}/nutch-site-test.xml        |   0
 .../nutch/indexwriter/solr/SolrIndexWriter.java    |   5 +-
 .../nutch/indexwriter/solr/SolrMappingReader.java  |   4 +-
 .../apache/nutch/indexwriter/solr/SolrUtils.java   |  14 +-
 .../nutch/analysis/lang/HTMLLanguageParser.java    |   5 +-
 .../nutch/protocol/htmlunit/HtmlUnitWebDriver.java |   4 +-
 .../apache/nutch/protocol/http/api/HttpBase.java   |  71 ++++----
 .../protocol/http/api/HttpRobotRulesParser.java    |  57 ++++++-
 .../nutch/urlfilter/api/RegexURLFilterBase.java    |   5 +-
 .../urlfilter/api/RegexURLFilterBaseTest.java      |   5 +-
 .../nutch/protocol/selenium/HttpWebClient.java     |   4 +-
 .../nutch/microformats/reltag/RelTagParser.java    |   4 +-
 .../indexer/filter/MimeTypeIndexingFilter.java     |   3 +-
 nutch-plugins/nutch-extensionpoints/plugin.xml     |   4 +
 .../java/org/apache/nutch/parse/ext/ExtParser.java |   5 +-
 .../org/apache/nutch/parse/html/HtmlParser.java    |   5 +-
 .../apache/nutch/parse/html/TestHtmlParser.java    |   5 +-
 .../org/apache/nutch/parse/js/JSParseFilter.java   |   4 +-
 .../java/org/apache/nutch/parse/swf/SWFParser.java |   5 +-
 .../org/apache/nutch/parse/tika/TikaParser.java    |   4 +-
 .../java/org/apache/nutch/tika/TestFeedParser.java |   6 +-
 .../java/org/apache/nutch/parse/zip/ZipParser.java |   4 +-
 .../apache/nutch/parse/zip/ZipTextExtractor.java   |   5 +-
 .../naivebayes/NaiveBayesParseFilter.java          |   3 +-
 nutch-plugins/parsefilter-regex/README.txt         |  41 +++++
 .../nutch/parsefilter/regex/RegexParseFilter.java  |  24 ++-
 nutch-plugins/pom.xml                              |   1 +
 .../java/org/apache/nutch/protocol/file/File.java  |  17 +-
 .../java/org/apache/nutch/protocol/ftp/Ftp.java    |  13 +-
 .../nutch/protocol/ftp/FtpRobotRulesParser.java    |  22 ++-
 .../org/apache/nutch/protocol/htmlunit/Http.java   |   4 +-
 .../java/org/apache/nutch/protocol/http/Http.java  |   4 +-
 .../apache/nutch/protocol/http/HttpResponse.java   |  11 +-
 nutch-plugins/protocol-httpclient/pom.xml          |  20 ++-
 .../httpclient/DummySSLProtocolSocketFactory.java  |   3 +-
 .../protocol/httpclient/DummyX509TrustManager.java |   3 +-
 .../org/apache/nutch/protocol/httpclient/Http.java |  83 ++++++----
 .../httpclient/HttpAuthenticationFactory.java      |   5 +-
 .../httpclient/HttpBasicAuthentication.java        |   5 +-
 .../httpclient/HttpFormAuthConfigurer.java         |  21 ++-
 .../httpclient/HttpFormAuthentication.java         |  85 +++++++---
 .../nutch/protocol/httpclient/HttpResponse.java    |   7 +
 .../nutch/protocol/interactiveselenium/Http.java   |   4 +-
 .../handlers/DefalultMultiInteractionHandler.java  |   4 +-
 .../handlers/DefaultClickAllAjaxLinksHandler.java  |   3 +-
 .../org/apache/nutch/protocol/selenium/Http.java   |   4 +-
 .../build-ivy.xml                                  |   2 +-
 .../{scoring-opic => publish-rabbitmq}/build.xml   |   2 +-
 .../{creativecommons => publish-rabbitmq}/ivy.xml  |   1 +
 .../{headings => publish-rabbitmq}/plugin.xml      |  20 +--
 .../{parse-html => publish-rabbitmq}/pom.xml       |  19 ++-
 .../publisher/rabbitmq/RabbitMQPublisherImpl.java  |  95 +++++++++++
 .../nutch/publisher/rabbitmq}/package-info.java    |   4 +-
 .../nutch/scoring/opic/OPICScoringFilter.java      |   5 +-
 .../similarity/cosine/CosineSimilarity.java        |   7 +-
 .../nutch/scoring/similarity/cosine/Model.java     |   6 +-
 .../apache/nutch/collection/CollectionManager.java |   4 +-
 .../subcollection/SubcollectionIndexingFilter.java |   6 +-
 .../nutch/indexer/tld/TLDIndexingFilter.java       |   5 +-
 .../nutch/urlfilter/domain/DomainURLFilter.java    |   3 +-
 .../domainblacklist/DomainBlacklistURLFilter.java  |   3 +-
 .../urlfilter/ignoreexempt/ExemptionUrlFilter.java |   5 +-
 .../nutch/urlfilter/prefix/PrefixURLFilter.java    |   3 +-
 .../nutch/urlfilter/suffix/SuffixURLFilter.java    |   3 +-
 .../indexer/urlmeta/URLMetaIndexingFilter.java     |   4 +-
 .../scoring/urlmeta/URLMetaScoringFilter.java      |   3 +-
 .../net/urlnormalizer/ajax/AjaxURLNormalizer.java  |   4 +-
 .../urlnormalizer/basic/BasicURLNormalizer.java    |  21 ++-
 .../basic/TestBasicURLNormalizer.java              |  11 +-
 .../net/urlnormalizer/host/HostURLNormalizer.java  |   3 +-
 .../protocol/ProtocolURLNormalizer.java            |   4 +-
 .../querystring/QuerystringURLNormalizer.java      |   3 +-
 .../urlnormalizer/regex/RegexURLNormalizer.java    |   3 +-
 .../regex/TestRegexURLNormalizer.java              |   3 +-
 .../urlnormalizer/slash/SlashURLNormalizer.java    |   4 +-
 pom.xml                                            |  20 ++-
 251 files changed, 2090 insertions(+), 925 deletions(-)
 create mode 100644 nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
 create mode 100644 nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
 copy nutch-core/src/main/java/org/apache/nutch/{parse/HtmlParseFilter.java => publisher/NutchPublisher.java} (54%)
 create mode 100644 nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublishers.java
 copy nutch-core/src/main/java/org/apache/nutch/{webui/service/NutchInstanceService.java => service/SeedManager.java} (69%)
 copy nutch-core/src/main/java/org/apache/nutch/service/{model/request/DbQuery.java => impl/SeedManagerImpl.java} (52%)
 rename {src/plugin/indexer-elastic/src/test => nutch-plugins/indexer-elastic/src/test/java}/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java (100%)
 rename {src/plugin/indexer-elastic/src/test/conf => nutch-plugins/indexer-elastic/src/test/resources}/nutch-site-test.xml (100%)
 create mode 100644 nutch-plugins/parsefilter-regex/README.txt
 copy nutch-plugins/{index-geoip => publish-rabbitmq}/build-ivy.xml (96%)
 copy nutch-plugins/{scoring-opic => publish-rabbitmq}/build.xml (95%)
 copy nutch-plugins/{creativecommons => publish-rabbitmq}/ivy.xml (93%)
 copy nutch-plugins/{headings => publish-rabbitmq}/plugin.xml (72%)
 copy nutch-plugins/{parse-html => publish-rabbitmq}/pom.xml (76%)
 create mode 100644 nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
 copy nutch-plugins/{parse-swf/src/main/java/org/apache/nutch/parse/swf => publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq}/package-info.java (90%)

-- 
To stop receiving notification emails like this one, please contact
['"commits@nutch.apache.org" <co...@nutch.apache.org>'].

[nutch] 03/03: Merge with latest changes from master

Posted by th...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

thammegowda pushed a commit to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 62491d5b0ac3349d684a493c9bd121442849ee8c
Author: Thamme Gowda <th...@apache.org>
AuthorDate: Sat Feb 25 06:26:39 2017 -0800

    Merge with latest changes from master
---
 .../elastic/TestElasticIndexWriter.java            |  0
 .../src/test/resources}/nutch-site-test.xml        |  0
 .../parsefilter-regex/README.txt                   |  0
 nutch-plugins/pom.xml                              |  1 +
 nutch-plugins/protocol-httpclient/pom.xml          | 20 ++++++++++++--
 .../publish-rabbitmq/build-ivy.xml                 |  0
 .../publish-rabbitmq/build.xml                     |  0
 .../publish-rabbitmq/ivy.xml                       |  0
 .../publish-rabbitmq/plugin.xml                    |  0
 .../pom.xml                                        | 32 ++++++++--------------
 .../publisher/rabbitmq/RabbitMQPublisherImpl.java  |  0
 .../nutch/publisher/rabbitmq/package-info.java     |  0
 12 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/src/plugin/indexer-elastic/src/test/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java b/nutch-plugins/indexer-elastic/src/test/java/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java
similarity index 100%
rename from src/plugin/indexer-elastic/src/test/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java
rename to nutch-plugins/indexer-elastic/src/test/java/org/apache/nutch/indexwriter/elastic/TestElasticIndexWriter.java
diff --git a/src/plugin/indexer-elastic/src/test/conf/nutch-site-test.xml b/nutch-plugins/indexer-elastic/src/test/resources/nutch-site-test.xml
similarity index 100%
rename from src/plugin/indexer-elastic/src/test/conf/nutch-site-test.xml
rename to nutch-plugins/indexer-elastic/src/test/resources/nutch-site-test.xml
diff --git a/src/plugin/parsefilter-regex/README.txt b/nutch-plugins/parsefilter-regex/README.txt
similarity index 100%
rename from src/plugin/parsefilter-regex/README.txt
rename to nutch-plugins/parsefilter-regex/README.txt
diff --git a/nutch-plugins/pom.xml b/nutch-plugins/pom.xml
index e07f487..0fc29e1 100644
--- a/nutch-plugins/pom.xml
+++ b/nutch-plugins/pom.xml
@@ -76,6 +76,7 @@
         <module>protocol-httpclient</module>
         <module>protocol-interactiveselenium</module>
         <module>protocol-selenium</module>
+        <module>publish-rabbitmq</module>
         <module>scoring-depth</module>
         <module>scoring-link</module>
         <module>scoring-opic</module>
diff --git a/nutch-plugins/protocol-httpclient/pom.xml b/nutch-plugins/protocol-httpclient/pom.xml
index 2f2fc7c..4fdac6c 100644
--- a/nutch-plugins/protocol-httpclient/pom.xml
+++ b/nutch-plugins/protocol-httpclient/pom.xml
@@ -33,12 +33,26 @@
 
     <properties>
         <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+        <commons.lang3.version>3.5</commons.lang3.version>
+        <jsoup.version>1.8.1</jsoup.version>
+        <jetty.version>6.1.26</jetty.version>
+        <jsp.version>6.1.14</jsp.version>
     </properties>
     <dependencies>
         <dependency>
             <groupId>org.jsoup</groupId>
             <artifactId>jsoup</artifactId>
-            <version>1.8.1</version>
+            <version>${jsoup.version}</version>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.commons</groupId>
+            <artifactId>commons-lang3</artifactId>
+            <version>${commons.lang3.version}</version>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.nutch</groupId>
+            <artifactId>lib-http</artifactId>
+            <version>${project.parent.version}</version>
         </dependency>
         <dependency>
             <groupId>org.apache.nutch</groupId>
@@ -48,13 +62,13 @@
         <dependency>
             <groupId> org.mortbay.jetty</groupId>
             <artifactId>jetty</artifactId>
-            <version>6.1.26</version>
+            <version>${jetty.version}</version>
             <scope>test</scope>
         </dependency>
         <dependency>
             <groupId> org.mortbay.jetty</groupId>
             <artifactId>jsp-2.1</artifactId>
-            <version>6.1.14</version>
+            <version>${jsp.version}</version>
             <scope>test</scope>
         </dependency>
     </dependencies>
diff --git a/src/plugin/publish-rabbitmq/build-ivy.xml b/nutch-plugins/publish-rabbitmq/build-ivy.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/build-ivy.xml
rename to nutch-plugins/publish-rabbitmq/build-ivy.xml
diff --git a/src/plugin/publish-rabbitmq/build.xml b/nutch-plugins/publish-rabbitmq/build.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/build.xml
rename to nutch-plugins/publish-rabbitmq/build.xml
diff --git a/src/plugin/publish-rabbitmq/ivy.xml b/nutch-plugins/publish-rabbitmq/ivy.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/ivy.xml
rename to nutch-plugins/publish-rabbitmq/ivy.xml
diff --git a/src/plugin/publish-rabbitmq/plugin.xml b/nutch-plugins/publish-rabbitmq/plugin.xml
similarity index 100%
rename from src/plugin/publish-rabbitmq/plugin.xml
rename to nutch-plugins/publish-rabbitmq/plugin.xml
diff --git a/nutch-plugins/protocol-httpclient/pom.xml b/nutch-plugins/publish-rabbitmq/pom.xml
similarity index 67%
copy from nutch-plugins/protocol-httpclient/pom.xml
copy to nutch-plugins/publish-rabbitmq/pom.xml
index 2f2fc7c..a8a434d 100644
--- a/nutch-plugins/protocol-httpclient/pom.xml
+++ b/nutch-plugins/publish-rabbitmq/pom.xml
@@ -25,38 +25,28 @@
         <version>1.13-SNAPSHOT</version>
         <relativePath>../pom.xml</relativePath>
     </parent>
-    <artifactId>protocol-httpclient</artifactId>
+    <artifactId>publish-rabitmq</artifactId>
     <packaging>jar</packaging>
 
-    <name>protocol-httpclient</name>
+    <name>publish-rabitmq</name>
     <url>http://nutch.apache.org</url>
 
     <properties>
         <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+        <rabitmq.version>3.6.5</rabitmq.version>
+        <jackson.version>2.8.6</jackson.version>
     </properties>
+
     <dependencies>
         <dependency>
-            <groupId>org.jsoup</groupId>
-            <artifactId>jsoup</artifactId>
-            <version>1.8.1</version>
-        </dependency>
-        <dependency>
-            <groupId>org.apache.nutch</groupId>
-            <artifactId>lib-http</artifactId>
-            <version>${project.parent.version}</version>
+            <groupId>com.fasterxml.jackson.core</groupId>
+            <artifactId>jackson-databind</artifactId>
+            <version>${jackson.version}</version>
         </dependency>
         <dependency>
-            <groupId> org.mortbay.jetty</groupId>
-            <artifactId>jetty</artifactId>
-            <version>6.1.26</version>
-            <scope>test</scope>
-        </dependency>
-        <dependency>
-            <groupId> org.mortbay.jetty</groupId>
-            <artifactId>jsp-2.1</artifactId>
-            <version>6.1.14</version>
-            <scope>test</scope>
+            <groupId>com.rabbitmq</groupId>
+            <artifactId>amqp-client</artifactId>
+            <version>${rabitmq.version}</version>
         </dependency>
     </dependencies>
-
 </project>
diff --git a/src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java b/nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
similarity index 100%
rename from src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
rename to nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/RabbitMQPublisherImpl.java
diff --git a/src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/package-info.java b/nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/package-info.java
similarity index 100%
rename from src/plugin/publish-rabbitmq/src/java/org/apache/nutch/publisher/rabbitmq/package-info.java
rename to nutch-plugins/publish-rabbitmq/src/main/java/org/apache/nutch/publisher/rabbitmq/package-info.java

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.

[nutch] 01/03: Merge branch 'master' into NUTCH-2292-1

Posted by th...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

thammegowda pushed a commit to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 98cd385b35bbd6b5b3a110b745f0eddd238b9456
Merge: 2175c76 3e2d3d4
Author: Thamme Gowda <th...@apache.org>
AuthorDate: Fri Feb 24 11:36:40 2017 -0800

    Merge branch 'master' into NUTCH-2292-1

 build.xml                                          |   3 +
 conf/httpclient-auth.xml.template                  |   6 +
 conf/nutch-default.xml                             |  75 +++++++++
 default.properties                                 |   4 +-
 ivy/ivy.xml                                        |   3 +
 ivy/mvn.template                                   |  10 +-
 .../apache/nutch/crawl/AbstractFetchSchedule.java  |   4 +-
 .../apache/nutch/crawl/AdaptiveFetchSchedule.java  |   7 +-
 .../java/org/apache/nutch/crawl/CrawlDatum.java    |   8 +-
 .../main/java/org/apache/nutch/crawl/CrawlDb.java  |  25 ++-
 .../java/org/apache/nutch/crawl/CrawlDbFilter.java |   4 +-
 .../java/org/apache/nutch/crawl/CrawlDbMerger.java |   5 +-
 .../java/org/apache/nutch/crawl/CrawlDbReader.java |  18 +-
 .../org/apache/nutch/crawl/CrawlDbReducer.java     |   7 +-
 .../org/apache/nutch/crawl/DeduplicationJob.java   |   7 +-
 .../apache/nutch/crawl/DefaultFetchSchedule.java   |   4 +
 .../apache/nutch/crawl/FetchScheduleFactory.java   |   6 +-
 .../java/org/apache/nutch/crawl/Generator.java     |  10 +-
 .../main/java/org/apache/nutch/crawl/Injector.java |  39 +++--
 .../main/java/org/apache/nutch/crawl/Inlinks.java  |   8 +-
 .../main/java/org/apache/nutch/crawl/LinkDb.java   |  25 ++-
 .../java/org/apache/nutch/crawl/LinkDbFilter.java  |   4 +-
 .../java/org/apache/nutch/crawl/LinkDbMerger.java  |   6 +-
 .../java/org/apache/nutch/crawl/LinkDbReader.java  |   6 +-
 .../nutch/crawl/MimeAdaptiveFetchSchedule.java     |   7 +-
 .../org/apache/nutch/crawl/SignatureFactory.java   |   4 +-
 .../apache/nutch/crawl/TextProfileSignature.java   |   6 +-
 .../org/apache/nutch/crawl/URLPartitioner.java     |   3 +-
 .../java/org/apache/nutch/fetcher/FetchItem.java   |   4 +-
 .../org/apache/nutch/fetcher/FetchItemQueue.java   |   6 +-
 .../org/apache/nutch/fetcher/FetchItemQueues.java  |   8 +-
 .../java/org/apache/nutch/fetcher/FetchNodeDb.java |   2 +-
 .../java/org/apache/nutch/fetcher/Fetcher.java     |  29 ++--
 .../org/apache/nutch/fetcher/FetcherThread.java    |  76 ++++++++-
 .../java/org/apache/nutch/fetcher/QueueFeeder.java |   5 +-
 .../java/org/apache/nutch/hostdb/ReadHostDb.java   |   4 +-
 .../org/apache/nutch/hostdb/ResolverThread.java    |   4 +-
 .../java/org/apache/nutch/hostdb/UpdateHostDb.java |   4 +-
 .../apache/nutch/hostdb/UpdateHostDbMapper.java    |   6 +-
 .../apache/nutch/hostdb/UpdateHostDbReducer.java   |  18 +-
 .../java/org/apache/nutch/indexer/CleaningJob.java |   4 +-
 .../org/apache/nutch/indexer/IndexWriters.java     |   6 +-
 .../org/apache/nutch/indexer/IndexerMapReduce.java |   5 +-
 .../org/apache/nutch/indexer/IndexingFilters.java  |   6 +-
 .../nutch/indexer/IndexingFiltersChecker.java      |   7 +-
 .../java/org/apache/nutch/indexer/IndexingJob.java |  25 ++-
 .../org/apache/nutch/indexer/NutchDocument.java    |   2 +-
 .../java/org/apache/nutch/indexer/NutchField.java  |   4 +-
 .../java/org/apache/nutch/metadata/Metadata.java   |   2 +-
 .../main/java/org/apache/nutch/metadata/Nutch.java |  13 ++
 .../nutch/metadata/SpellCheckedMetadata.java       |   2 +-
 .../org/apache/nutch/net/URLExemptionFilters.java  |   5 +-
 .../java/org/apache/nutch/net/URLNormalizers.java  |  13 +-
 .../org/apache/nutch/parse/OutlinkExtractor.java   |   5 +-
 .../java/org/apache/nutch/parse/ParseData.java     |   5 +-
 .../org/apache/nutch/parse/ParseOutputFormat.java  |   7 +-
 .../org/apache/nutch/parse/ParsePluginList.java    |   4 +-
 .../org/apache/nutch/parse/ParsePluginsReader.java |   9 +-
 .../java/org/apache/nutch/parse/ParseResult.java   |   6 +-
 .../java/org/apache/nutch/parse/ParseSegment.java  |  30 ++--
 .../java/org/apache/nutch/parse/ParseText.java     |   5 +-
 .../java/org/apache/nutch/parse/ParseUtil.java     |   4 +-
 .../java/org/apache/nutch/parse/ParserChecker.java |   6 +-
 .../java/org/apache/nutch/parse/ParserFactory.java |   8 +-
 .../java/org/apache/nutch/plugin/Extension.java    |   2 +-
 .../org/apache/nutch/plugin/ExtensionPoint.java    |   2 +-
 .../org/apache/nutch/plugin/PluginDescriptor.java  |  21 +--
 .../apache/nutch/plugin/PluginManifestParser.java  |   2 +-
 .../org/apache/nutch/plugin/PluginRepository.java  |  35 ++--
 .../java/org/apache/nutch/protocol/Content.java    |   5 +-
 .../java/org/apache/nutch/protocol/Protocol.java   |  38 ++---
 .../org/apache/nutch/protocol/ProtocolFactory.java |   5 +-
 .../org/apache/nutch/protocol/ProtocolOutput.java  |  14 ++
 .../org/apache/nutch/protocol/ProtocolStatus.java  |   2 +-
 .../apache/nutch/protocol/RobotRulesParser.java    | 183 ++++++++++++++++-----
 .../apache/nutch/scoring/webgraph/LinkDumper.java  |  10 +-
 .../apache/nutch/scoring/webgraph/LinkRank.java    |  10 +-
 .../apache/nutch/scoring/webgraph/NodeDumper.java  |   4 +-
 .../apache/nutch/scoring/webgraph/NodeReader.java  |   2 +-
 .../nutch/scoring/webgraph/ScoreUpdater.java       |   4 +-
 .../apache/nutch/scoring/webgraph/WebGraph.java    |  12 +-
 .../nutch/segment/ContentAsTextInputFormat.java    |   2 +-
 .../org/apache/nutch/segment/SegmentChecker.java   |   5 +-
 .../apache/nutch/segment/SegmentMergeFilters.java  |   3 +-
 .../org/apache/nutch/segment/SegmentMerger.java    |  13 +-
 .../org/apache/nutch/segment/SegmentReader.java    |  66 +++++---
 .../java/org/apache/nutch/service/NutchReader.java |   6 +-
 .../java/org/apache/nutch/service/NutchServer.java |  15 +-
 .../org/apache/nutch/service/impl/JobWorker.java   |   4 +-
 .../org/apache/nutch/service/impl/LinkReader.java  |   8 +-
 .../org/apache/nutch/service/impl/NodeReader.java  |   8 +-
 .../apache/nutch/service/impl/SequenceReader.java  |  12 +-
 .../nutch/service/model/request/DbQuery.java       |   2 +-
 .../nutch/service/model/request/SeedList.java      |  10 ++
 .../service/model/response/FetchNodeDbInfo.java    |   2 +-
 .../nutch/service/resources/AdminResource.java     |   3 +-
 .../apache/nutch/service/resources/DbResource.java |   2 +-
 .../nutch/service/resources/SeedResource.java      | 105 ++++++------
 .../nutch/tools/AbstractCommonCrawlFormat.java     |   4 +-
 .../java/org/apache/nutch/tools/Benchmark.java     |   8 +-
 .../apache/nutch/tools/CommonCrawlDataDumper.java  |   7 +-
 .../nutch/tools/CommonCrawlFormatJettinson.java    |   4 +-
 .../java/org/apache/nutch/tools/DmozParser.java    |  26 +--
 .../java/org/apache/nutch/tools/FileDumper.java    |  40 ++---
 .../java/org/apache/nutch/tools/FreeGenerator.java |   5 +-
 .../java/org/apache/nutch/tools/ResolveUrls.java   |   4 +-
 .../apache/nutch/tools/arc/ArcRecordReader.java    |   5 +-
 .../apache/nutch/tools/arc/ArcSegmentCreator.java  |   5 +-
 .../org/apache/nutch/tools/warc/WARCExporter.java  |   6 +-
 .../apache/nutch/util/CrawlCompletionStats.java    |   3 +-
 .../java/org/apache/nutch/util/DeflateUtils.java   |   4 +-
 .../main/java/org/apache/nutch/util/DomUtil.java   |   4 +-
 .../java/org/apache/nutch/util/DumpFileUtil.java   |   5 +-
 .../org/apache/nutch/util/EncodingDetector.java    |  15 +-
 .../main/java/org/apache/nutch/util/GZIPUtils.java |   4 +-
 .../java/org/apache/nutch/util/HadoopFSUtil.java   |  19 +--
 .../main/java/org/apache/nutch/util/JexlUtil.java  |   4 +-
 .../main/java/org/apache/nutch/util/MimeUtil.java  |  10 +-
 .../java/org/apache/nutch/util/NodeWalker.java     |   2 +-
 .../main/java/org/apache/nutch/util/NutchTool.java |   2 +-
 .../java/org/apache/nutch/util/ObjectCache.java    |   8 +-
 .../nutch/util/ProtocolStatusStatistics.java       |   3 +-
 .../org/apache/nutch/util/TrieStringMatcher.java   |   4 +-
 .../apache/nutch/util/domain/DomainStatistics.java |   3 +-
 .../apache/nutch/util/domain/DomainSuffixes.java   |   5 +-
 .../nutch/util/domain/DomainSuffixesReader.java    |   3 +-
 .../nutch/webui/client/impl/CrawlingCycle.java     |   6 +-
 .../webui/client/impl/RemoteCommandExecutor.java   |   6 +-
 .../webui/pages/components/ColorEnumLabel.java     |   2 +-
 .../pages/components/ColorEnumLabelBuilder.java    |   2 +-
 .../webui/pages/components/CpmIteratorAdapter.java |   2 +-
 .../nutch/webui/pages/crawls/CrawlPanel.java       |   8 +-
 .../nutch/webui/pages/crawls/CrawlsPage.java       |   4 +-
 .../nutch/webui/pages/instances/InstancePanel.java |   2 +-
 .../nutch/webui/pages/instances/InstancesPage.java |   4 +-
 .../nutch/webui/pages/seed/SeedListsPage.java      |   4 +-
 .../apache/nutch/webui/pages/seed/SeedPage.java    |   6 +-
 .../nutch/webui/pages/settings/SettingsPage.java   |   4 +-
 .../nutch/webui/service/impl/CrawlServiceImpl.java |   6 +-
 .../nutch/webui/service/impl/NutchServiceImpl.java |   7 +-
 .../nutch/crawl/ContinuousCrawlTestUtil.java       |   3 +-
 .../org/apache/nutch/crawl/CrawlDBTestUtil.java    |   3 +-
 .../nutch/crawl/CrawlDbUpdateTestDriver.java       |   3 +-
 .../org/apache/nutch/crawl/CrawlDbUpdateUtil.java  |   3 +-
 .../apache/nutch/crawl/TODOTestCrawlDbStates.java  |   4 +-
 .../org/apache/nutch/crawl/TestCrawlDbMerger.java  |  18 +-
 .../org/apache/nutch/crawl/TestCrawlDbStates.java  |  18 +-
 .../org/apache/nutch/crawl/TestLinkDbMerger.java   |  18 +-
 .../apache/nutch/indexer/TestIndexerMapReduce.java |   4 +-
 .../segment/TestSegmentMergerCrawlDatums.java      |   3 +-
 .../org/apache/nutch/service/TestNutchServer.java  |   4 +-
 .../apache/nutch/tools/proxy/LogDebugHandler.java  |   3 +-
 .../org/apache/nutch/tools/proxy/ProxyTestbed.java |   4 +-
 .../apache/nutch/tools/proxy/SegmentHandler.java   |   3 +-
 nutch-plugins/build.xml                            |   1 +
 .../creativecommons/nutch/CCIndexingFilter.java    |   5 +-
 .../org/creativecommons/nutch/CCParseFilter.java   |   4 +-
 .../org/apache/nutch/parse/feed/FeedParser.java    |   4 +-
 .../apache/nutch/parse/feed/TestFeedParser.java    |   5 +-
 .../nutch/indexer/anchor/AnchorIndexingFilter.java |   5 +-
 .../nutch/indexer/basic/BasicIndexingFilter.java   |   5 +-
 .../nutch/indexer/geoip/GeoIPIndexingFilter.java   |   3 +-
 .../nutch/indexer/links/LinksIndexingFilter.java   |   7 +-
 .../nutch/indexer/more/MoreIndexingFilter.java     |   5 +-
 .../cloudsearch/CloudSearchIndexWriter.java        |   5 +-
 .../nutch/indexwriter/dummy/DummyIndexWriter.java  |   5 +-
 .../indexwriter/elastic/ElasticIndexWriter.java    |   4 +-
 .../nutch/indexwriter/solr/SolrIndexWriter.java    |   5 +-
 .../nutch/indexwriter/solr/SolrMappingReader.java  |   4 +-
 .../apache/nutch/indexwriter/solr/SolrUtils.java   |  14 +-
 .../nutch/analysis/lang/HTMLLanguageParser.java    |   5 +-
 .../nutch/protocol/htmlunit/HtmlUnitWebDriver.java |   4 +-
 .../apache/nutch/protocol/http/api/HttpBase.java   |  71 ++++----
 .../protocol/http/api/HttpRobotRulesParser.java    |  57 ++++++-
 .../nutch/urlfilter/api/RegexURLFilterBase.java    |   5 +-
 .../urlfilter/api/RegexURLFilterBaseTest.java      |   5 +-
 .../nutch/protocol/selenium/HttpWebClient.java     |   4 +-
 .../nutch/microformats/reltag/RelTagParser.java    |   4 +-
 .../indexer/filter/MimeTypeIndexingFilter.java     |   3 +-
 nutch-plugins/nutch-extensionpoints/plugin.xml     |   4 +
 .../java/org/apache/nutch/parse/ext/ExtParser.java |   5 +-
 .../org/apache/nutch/parse/html/HtmlParser.java    |   5 +-
 .../apache/nutch/parse/html/TestHtmlParser.java    |   5 +-
 .../org/apache/nutch/parse/js/JSParseFilter.java   |   4 +-
 .../java/org/apache/nutch/parse/swf/SWFParser.java |   5 +-
 .../org/apache/nutch/parse/tika/TikaParser.java    |   4 +-
 .../java/org/apache/nutch/tika/TestFeedParser.java |   6 +-
 .../java/org/apache/nutch/parse/zip/ZipParser.java |   4 +-
 .../apache/nutch/parse/zip/ZipTextExtractor.java   |   5 +-
 .../naivebayes/NaiveBayesParseFilter.java          |   3 +-
 .../nutch/parsefilter/regex/RegexParseFilter.java  |  24 ++-
 .../java/org/apache/nutch/protocol/file/File.java  |  17 +-
 .../java/org/apache/nutch/protocol/ftp/Ftp.java    |  13 +-
 .../nutch/protocol/ftp/FtpRobotRulesParser.java    |  22 ++-
 .../org/apache/nutch/protocol/htmlunit/Http.java   |   4 +-
 .../java/org/apache/nutch/protocol/http/Http.java  |   4 +-
 .../apache/nutch/protocol/http/HttpResponse.java   |  11 +-
 .../httpclient/DummySSLProtocolSocketFactory.java  |   3 +-
 .../protocol/httpclient/DummyX509TrustManager.java |   3 +-
 .../org/apache/nutch/protocol/httpclient/Http.java |  83 ++++++----
 .../httpclient/HttpAuthenticationFactory.java      |   5 +-
 .../httpclient/HttpBasicAuthentication.java        |   5 +-
 .../httpclient/HttpFormAuthConfigurer.java         |  21 ++-
 .../httpclient/HttpFormAuthentication.java         |  85 +++++++---
 .../nutch/protocol/httpclient/HttpResponse.java    |   7 +
 .../nutch/protocol/interactiveselenium/Http.java   |   4 +-
 .../handlers/DefalultMultiInteractionHandler.java  |   4 +-
 .../handlers/DefaultClickAllAjaxLinksHandler.java  |   3 +-
 .../org/apache/nutch/protocol/selenium/Http.java   |   4 +-
 .../nutch/scoring/opic/OPICScoringFilter.java      |   5 +-
 .../similarity/cosine/CosineSimilarity.java        |   7 +-
 .../nutch/scoring/similarity/cosine/Model.java     |   6 +-
 .../apache/nutch/collection/CollectionManager.java |   4 +-
 .../subcollection/SubcollectionIndexingFilter.java |   6 +-
 .../nutch/indexer/tld/TLDIndexingFilter.java       |   5 +-
 .../nutch/urlfilter/domain/DomainURLFilter.java    |   3 +-
 .../domainblacklist/DomainBlacklistURLFilter.java  |   3 +-
 .../urlfilter/ignoreexempt/ExemptionUrlFilter.java |   5 +-
 .../nutch/urlfilter/prefix/PrefixURLFilter.java    |   3 +-
 .../nutch/urlfilter/suffix/SuffixURLFilter.java    |   3 +-
 .../indexer/urlmeta/URLMetaIndexingFilter.java     |   4 +-
 .../scoring/urlmeta/URLMetaScoringFilter.java      |   3 +-
 .../net/urlnormalizer/ajax/AjaxURLNormalizer.java  |   4 +-
 .../urlnormalizer/basic/BasicURLNormalizer.java    |  21 ++-
 .../basic/TestBasicURLNormalizer.java              |  11 +-
 .../net/urlnormalizer/host/HostURLNormalizer.java  |   3 +-
 .../protocol/ProtocolURLNormalizer.java            |   4 +-
 .../querystring/QuerystringURLNormalizer.java      |   3 +-
 .../urlnormalizer/regex/RegexURLNormalizer.java    |   3 +-
 .../regex/TestRegexURLNormalizer.java              |   3 +-
 .../urlnormalizer/slash/SlashURLNormalizer.java    |   4 +-
 .../apache/nutch/fetcher/FetcherThreadEvent.java   | 147 +++++++++++++++++
 .../nutch/fetcher/FetcherThreadPublisher.java      |  61 +++++++
 .../org/apache/nutch/publisher/NutchPublisher.java |  47 +++---
 .../apache/nutch/publisher/NutchPublishers.java    |  83 ++++++++++
 .../java/org/apache/nutch/service/SeedManager.java |  34 ++--
 .../apache/nutch/service/impl/SeedManagerImpl.java |  60 +++----
 src/plugin/parsefilter-regex/README.txt            |  41 +++++
 src/plugin/publish-rabbitmq/build-ivy.xml          |  54 ++++++
 src/plugin/publish-rabbitmq/build.xml              |  27 +++
 src/plugin/publish-rabbitmq/ivy.xml                |  42 +++++
 src/plugin/publish-rabbitmq/plugin.xml             |  43 +++++
 .../publisher/rabbitmq/RabbitMQPublisherImpl.java  |  95 +++++++++++
 .../nutch/publisher/rabbitmq/package-info.java     |  25 +--
 244 files changed, 2202 insertions(+), 927 deletions(-)

diff --cc nutch-core/src/test/java/org/apache/nutch/crawl/TestCrawlDbMerger.java
index 599c353,7c4b2eb..bfb1581
--- a/nutch-core/src/test/java/org/apache/nutch/crawl/TestCrawlDbMerger.java
+++ b/nutch-core/src/test/java/org/apache/nutch/crawl/TestCrawlDbMerger.java
@@@ -36,11 -37,10 +38,11 @@@ import org.junit.After
  import org.junit.Assert;
  import org.junit.Before;
  import org.junit.Test;
 +import org.junit.experimental.categories.Category;
  
  public class TestCrawlDbMerger {
-   private static final Logger LOG = Logger.getLogger(CrawlDbMerger.class
-       .getName());
+   private static final Logger LOG = LoggerFactory
+       .getLogger(MethodHandles.lookup().lookupClass());
  
    String url10 = "http://example.com/";
    String url11 = "http://example.com/foo";

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.

[nutch] 02/03: Upstream changes, upgrade to JDK 8, add license header

Posted by th...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

thammegowda pushed a commit to branch NUTCH-2292
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 9c25a8c73c71cadb906624587ce14f4587b4b153
Author: Thamme Gowda <th...@apache.org>
AuthorDate: Fri Feb 24 11:54:34 2017 -0800

    Upstream changes, upgrade to JDK 8, add license header
---
 .gitignore                                           |  6 +++++-
 .../org/apache/nutch/fetcher/FetcherThreadEvent.java |  0
 .../apache/nutch/fetcher/FetcherThreadPublisher.java |  0
 .../org/apache/nutch/publisher/NutchPublisher.java   |  0
 .../org/apache/nutch/publisher/NutchPublishers.java  |  0
 .../java/org/apache/nutch/service/SeedManager.java   |  0
 .../apache/nutch/service/impl/SeedManagerImpl.java   |  0
 pom.xml                                              | 20 ++++++++++++++++++--
 8 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/.gitignore b/.gitignore
index 7a70f9d..e0cfd33 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,4 +11,8 @@ logs/
 target/
 nutch-core/target
 nutch-plugins/target
-nutch-plugins/*/target
\ No newline at end of file
+nutch-plugins/*/target
+
+# IntelliJ Idea
+.idea
+**.iml
\ No newline at end of file
diff --git a/src/java/org/apache/nutch/fetcher/FetcherThreadEvent.java b/nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
similarity index 100%
rename from src/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
rename to nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadEvent.java
diff --git a/src/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java b/nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
similarity index 100%
rename from src/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
rename to nutch-core/src/main/java/org/apache/nutch/fetcher/FetcherThreadPublisher.java
diff --git a/src/java/org/apache/nutch/publisher/NutchPublisher.java b/nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublisher.java
similarity index 100%
rename from src/java/org/apache/nutch/publisher/NutchPublisher.java
rename to nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublisher.java
diff --git a/src/java/org/apache/nutch/publisher/NutchPublishers.java b/nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublishers.java
similarity index 100%
rename from src/java/org/apache/nutch/publisher/NutchPublishers.java
rename to nutch-core/src/main/java/org/apache/nutch/publisher/NutchPublishers.java
diff --git a/src/java/org/apache/nutch/service/SeedManager.java b/nutch-core/src/main/java/org/apache/nutch/service/SeedManager.java
similarity index 100%
rename from src/java/org/apache/nutch/service/SeedManager.java
rename to nutch-core/src/main/java/org/apache/nutch/service/SeedManager.java
diff --git a/src/java/org/apache/nutch/service/impl/SeedManagerImpl.java b/nutch-core/src/main/java/org/apache/nutch/service/impl/SeedManagerImpl.java
similarity index 100%
rename from src/java/org/apache/nutch/service/impl/SeedManagerImpl.java
rename to nutch-core/src/main/java/org/apache/nutch/service/impl/SeedManagerImpl.java
diff --git a/pom.xml b/pom.xml
index a3b9271..ff2147a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1,4 +1,20 @@
 <?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
 <project xmlns="http://maven.apache.org/POM/4.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
@@ -26,8 +42,8 @@
                 <groupId>org.apache.maven.plugins</groupId>
                 <artifactId>maven-compiler-plugin</artifactId>
                 <configuration>
-                    <source>1.7</source>
-                    <target>1.7</target>
+                    <source>1.8</source>
+                    <target>1.8</target>
                 </configuration>
             </plugin>
             <plugin>

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.