You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-1486) Upgrade to Solr 4.3.0 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/01 04:47:24 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-656) DeleteDuplicates based on crawlDB only - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/10/01 09:28:24 UTC, 0 replies.
- [jira] [Created] (NUTCH-1646) IndexerMapReduce to consider DB status - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/01 12:45:23 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1646) IndexerMapReduce to consider DB status - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/01 12:45:24 UTC, 3 replies.
- [jira] [Created] (NUTCH-1647) protocol-http throws unzipBestEffort returned null for some pages - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/01 14:05:24 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1646) IndexerMapReduce to consider DB status - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/01 14:09:23 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1647) protocol-http throws unzipBestEffort returned null for some pages - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/01 14:38:24 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1646) IndexerMapReduce to consider DB status - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/01 14:51:26 UTC, 0 replies.
- [jira] [Commented] (NUTCH-656) DeleteDuplicates based on crawlDB only - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/02 11:21:25 UTC, 0 replies.
- [jira] [Created] (NUTCH-1648) Sentence Detection plugin - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/02 12:46:24 UTC, 0 replies.
- [jira] [Created] (NUTCH-1649) Sentence Detection plugin - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/02 12:57:26 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1649) Sentence Detection plugin - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/02 13:03:30 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1648) Sentence Detection plugin - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/02 14:42:24 UTC, 0 replies.
- Re: Missing "nightly" API Docs - posted by Lewis John Mcgibbney <le...@gmail.com> on 2013/10/03 02:54:05 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1645) Junit Test Case for Adaptive Fetch Schedule class - posted by "Yasin Kılınç (JIRA)" <ji...@apache.org> on 2013/10/03 11:07:43 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1482) Rename HTMLParseFilter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/03 12:47:44 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1621) Deprecated class o.a.n.crawl.Crawler is still in code base - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/03 12:47:45 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1639) bin/crawl fails on mac os - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/03 13:11:46 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1639) bin/crawl fails on mac os - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/10/03 13:50:42 UTC, 0 replies.
- [jira] [Created] (NUTCH-1650) Adaptive Fetch Scheduler interval Wrong Set - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/04 09:30:41 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1650) Adaptive Fetch Scheduler interval Wrong Set - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/04 09:34:41 UTC, 0 replies.
- [jira] [Created] (NUTCH-1651) modifiedTime and prevmodifiedTime never set - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/04 09:59:42 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1651) modifiedTime and prevmodifiedTime never set - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/04 10:01:52 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1650) Adaptive Fetch Scheduler interval Wrong Set - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/04 10:30:43 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1562) Order of execution for scoring filters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/04 11:39:42 UTC, 4 replies.
- [jira] [Resolved] (NUTCH-1642) mvn compile fails on Centos6.3 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/04 11:59:48 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1640) OOM in ParseSegment Phase - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/04 14:04:44 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1371) Replace Ivy with Maven Ant tasks - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/04 17:01:50 UTC, 1 replies.
- Build failed in Jenkins: Nutch-nutchgora #778 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/05 06:06:24 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2376 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/05 06:09:49 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1588) Port NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again to 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/05 12:33:42 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1568) port pluggable indexing architecture to 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/05 14:37:42 UTC, 6 replies.
- [jira] [Commented] (NUTCH-1645) Junit Test Case for Adaptive Fetch Schedule class - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/05 15:03:43 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1562) Order of execution for scoring filters - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/10/05 23:36:43 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #779 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/06 06:07:01 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2377 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/06 06:10:42 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1588) Port NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again to 2.x - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/10/06 22:33:42 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1640) OOM in ParseSegment Phase - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/07 11:25:42 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1562) Order of execution for scoring filters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/07 12:09:43 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1606) Check that Factory classes use the cache in a thread safe way - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/07 14:02:41 UTC, 0 replies.
- [jira] [Created] (NUTCH-1652) Avoid instanciation of MimeUtil for each Content object created - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/07 16:55:42 UTC, 0 replies.
- [jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support - posted by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/10/08 07:02:47 UTC, 3 replies.
- splitting the content in the crawled web pages in nutch - posted by arul jack <ar...@gmail.com> on 2013/10/08 12:58:04 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2382 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/09 06:34:43 UTC, 0 replies.
- Nutch Crawling - posted by Andrew Schultz <an...@gmail.com> on 2013/10/10 05:11:53 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2383 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/10 06:25:08 UTC, 0 replies.
- [jira] [Created] (NUTCH-1653) AbstractScoringFilter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/10 10:24:41 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1653) AbstractScoringFilter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/10 10:24:42 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1653) AbstractScoringFilter - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/10 10:48:42 UTC, 2 replies.
- [jira] [Comment Edited] (NUTCH-1568) port pluggable indexing architecture to 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/10 13:22:42 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1653) AbstractScoringFilter - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/10 15:50:44 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1646) IndexerMapReduce to consider DB status - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/11 13:59:43 UTC, 0 replies.
- [jira] [Created] (NUTCH-1654) FetchSchedule.setFetchSchedule called twice - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/11 16:04:43 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1606) Check that Factory classes use the cache in a thread safe way - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/11 16:06:41 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1654) FetchSchedule.setFetchSchedule called twice - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/11 16:06:42 UTC, 0 replies.
- [ANNOUNCEMENT] 0.3 release of crawler-commons - posted by Julien Nioche <li...@gmail.com> on 2013/10/11 20:20:22 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1568) port pluggable indexing architecture to 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/12 08:25:42 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1655) Indexer Plugin for Elastic Search - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/12 08:31:43 UTC, 2 replies.
- [jira] [Created] (NUTCH-1655) Indexer Plugin for Elastic Search - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/12 08:31:43 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1606) Check that Factory classes use the cache in a thread safe way - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/14 11:47:42 UTC, 0 replies.
- Subscription for Nutch developer mailing list - posted by Tej Kumar Ilindra <te...@gmail.com> on 2013/10/15 18:32:05 UTC, 0 replies.
- [jira] [Created] (NUTCH-1656) ParseMeta not passed to CrawlDatum for not_modified - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/16 11:17:41 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1656) ParseMeta not passed to CrawlDatum for not_modified - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/16 11:17:42 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/16 12:04:43 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1377) Add option to index via CloudSolrServer instead - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/16 12:26:43 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1656) ParseMeta not passed to CrawlDatum for not_modified - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/16 12:43:42 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1656) ParseMeta not passed to CrawlDatum for not_modified - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/10/16 16:57:42 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #790 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/17 06:03:30 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/17 12:37:42 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #791 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/18 12:59:55 UTC, 0 replies.
- [Nutch Wiki] Update of "HowToContribute" by JulienNioche - posted by Apache Wiki <wi...@apache.org> on 2013/10/18 14:44:11 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series - posted by "Nick (JIRA)" <ji...@apache.org> on 2013/10/19 03:59:42 UTC, 2 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series - posted by "Nick (JIRA)" <ji...@apache.org> on 2013/10/19 04:03:41 UTC, 0 replies.
- [jira] [Updated] (NUTCH-656) DeleteDuplicates based on crawlDB only - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/19 15:36:42 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1648) Sentence Detection plugin - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/21 15:11:44 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1648) Sentence Detection plugin - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/21 15:57:42 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore - posted by "Alfonso Nishikawa (JIRA)" <ji...@apache.org> on 2013/10/22 10:27:42 UTC, 0 replies.
- About ParseMetadata - posted by Talat UYARER <ta...@agmlab.com> on 2013/10/22 13:34:40 UTC, 4 replies.
- RE: Alternative to Forrest for Nutch website - posted by Markus Jelsma <ma...@openindex.io> on 2013/10/22 15:07:14 UTC, 8 replies.
- Build failed in Jenkins: Nutch-nutchgora #796 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/23 06:13:49 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-nutchgora #797 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/24 06:30:48 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #799 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/26 06:26:52 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1125) JUnit test for tld - posted by "msertacturkel (JIRA)" <ji...@apache.org> on 2013/10/26 20:47:30 UTC, 2 replies.
- Jenkins build is back to normal : Nutch-nutchgora #800 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/27 05:07:32 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1125) JUnit test for tld - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/27 10:48:30 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1125) JUnit test for tld - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/27 10:50:31 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1124) JUnit test for scoring-opic - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/27 12:57:31 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1413) Fetcher to record response time - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/27 13:07:30 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1124) JUnit test for scoring-opic - posted by "Hudson (JIRA)" <ji...@apache.org> on 2013/10/27 13:44:32 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1643) Unnecessary fetching with http.content.limit when using protocol-http - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/27 19:58:34 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1651) modifiedTime and prevmodifiedTime never set - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/27 20:16:31 UTC, 5 replies.
- [jira] [Updated] (NUTCH-1413) Fetcher to record response time - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/27 20:22:31 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1564) AdaptiveFetchSchedule: sync_delta forces immediate refetch for documents not modified - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/28 16:42:31 UTC, 2 replies.
- [jira] [Created] (NUTCH-1657) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never set in HTMLParser - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/28 17:00:31 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1657) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never set in HTMLParser - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2013/10/28 17:22:30 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1643) Unnecessary fetching with http.content.limit when using protocol-http - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/10/28 22:38:31 UTC, 0 replies.
- Lucene SOLR Revolution Dublin - posted by Julien Nioche <li...@gmail.com> on 2013/10/29 17:18:32 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1564) AdaptiveFetchSchedule: sync_delta forces immediate refetch for documents not modified - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/10/29 20:49:25 UTC, 1 replies.
- [jira] [Created] (NUTCH-1658) Nutch mangles seed URLs and then reports on the mangled ones - posted by "Steve Newcomb (JIRA)" <ji...@apache.org> on 2013/10/30 17:57:26 UTC, 0 replies.
- Build failed in Jenkins: Nutch-trunk #2408 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/10/31 07:18:20 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1658) Nutch mangles seed URLs and then reports on the mangled ones - posted by "Steve Newcomb (JIRA)" <ji...@apache.org> on 2013/10/31 13:49:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1658) Nutch mangles seed URLs and then reports on the mangled ones - posted by "Steve Newcomb (JIRA)" <ji...@apache.org> on 2013/10/31 13:55:17 UTC, 0 replies.
- [jira] [Created] (NUTCH-1659) Custom partitioner for Adaptive Queue Size - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/31 14:33:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1659) Custom partitioner for Adaptive Queue Size - posted by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/31 14:33:19 UTC, 1 replies.