You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 07:55:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1570) Add filtering capability to Datastore Queries - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:01:22 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1410) impact of a map-reduce problem - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:03:15 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1410) impact of a map-reduce problem - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:03:15 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1490) Data Truncation exceptions when using mysql - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:17:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1490) Data Truncation exceptions when using mysql - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:17:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:17:15 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:17:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:19:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1714) Nutch 2.x upgrade to Gora 0.4 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:19:16 UTC, 3 replies.
- Re: [DISCUSS] Roadmap for 2.3 Release - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/05/01 08:27:07 UTC, 6 replies.
- [jira] [Updated] (NUTCH-1301) Index job resume switch to resume a failed job - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/01 08:27:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1714) Nutch 2.x upgrade to Gora 0.4 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 08:40:15 UTC, 30 replies.
- [jira] [Commented] (NUTCH-1753) Eclipse dependecy problem for 2.x - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 17:59:14 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1740) BatchId parameter is not set in DbUpdaterJob - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:03:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1679) UpdateDb using batchId, link may override crawled page. - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:03:17 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1728) indexer-solr plugin is not delete docs from solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:07:19 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1728) indexer-solr plugin is not delete docs from solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:09:15 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1725) CleaningJob's reducer does not commit deleted docs. - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:11:18 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1725) CleaningJob's reducer does not commit deleted docs. - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:11:18 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1662) Indexer Plugin for Solr Cloud - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:11:19 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1662) Indexer Plugin for Solr Cloud - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:15:17 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1657) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never set in HTMLParser - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:21:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1657) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never set in HTMLParser - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:21:16 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1618) Fetches some websites multiple times for long lasting queues - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:25:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1618) Fetches some websites multiple times for long lasting queues - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/01 18:25:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1768) port NUTCH-1745 to Nutch 2.x (Upgrade to ElasticSearch 1.1.0) - posted by "Rogério Pereira Araújo (JIRA)" <ji...@apache.org> on 2014/05/01 22:20:15 UTC, 5 replies.
- [jira] [Created] (NUTCH-1769) API refactoring - posted by "Ivan Vershinin (JIRA)" <ji...@apache.org> on 2014/05/02 10:04:16 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1753) Eclipse dependecy problem for 2.x - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/02 10:12:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1622) Create Outlinks with metadata - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/02 10:35:15 UTC, 3 replies.
- [jira] [Assigned] (NUTCH-1622) Create Outlinks with metadata - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/02 10:41:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1622) Create Outlinks with metadata - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/02 11:11:16 UTC, 5 replies.
- [jira] [Resolved] (NUTCH-1728) indexer-solr plugin is not delete docs from solr - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/02 11:59:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/02 13:29:16 UTC, 3 replies.
- [jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/02 13:31:16 UTC, 0 replies.
- [jira] [Comment Edited] (NUTCH-1768) port NUTCH-1745 to Nutch 2.x (Upgrade to ElasticSearch 1.1.0) - posted by "Rogério Pereira Araújo (JIRA)" <ji...@apache.org> on 2014/05/02 17:01:27 UTC, 1 replies.
- Giraph Integration - posted by Talat Uyarer <ta...@uyarer.com> on 2014/05/02 23:57:26 UTC, 2 replies.
- About RankingJob for Giraph - posted by Talat Uyarer <ta...@uyarer.com> on 2014/05/03 00:10:22 UTC, 2 replies.
- Better Parser Plugin - posted by Talat Uyarer <ta...@uyarer.com> on 2014/05/03 01:25:53 UTC, 5 replies.
- [jira] [Comment Edited] (NUTCH-1714) Nutch 2.x upgrade to Gora 0.4 - posted by "Alparslan Avcı (JIRA)" <ji...@apache.org> on 2014/05/03 09:53:15 UTC, 6 replies.
- [jira] [Updated] (NUTCH-1182) fetcher to log hung threads - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/03 13:35:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-207) Bandwidth target for fetcher rather than a thread count - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/03 13:46:16 UTC, 6 replies.
- [jira] [Resolved] (NUTCH-1725) CleaningJob's reducer does not commit deleted docs. - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 15:23:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1770) Nutch is Failing to parse all PDFs - posted by "Rogério Pereira Araújo (JIRA)" <ji...@apache.org> on 2014/05/03 15:25:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1770) Nutch is failing to parse all PDFs - posted by "Rogério Pereira Araújo (JIRA)" <ji...@apache.org> on 2014/05/03 15:27:15 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1770) Nutch is failing to parse all PDFs - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 15:39:15 UTC, 5 replies.
- [jira] [Commented] (NUTCH-1769) API refactoring - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 15:45:15 UTC, 6 replies.
- [jira] [Issue Comment Deleted] (NUTCH-1657) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never set in HTMLParser - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 15:49:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1657) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never set in HTMLParser - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 15:49:16 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1677) ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION are not set in Parse HTML - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 15:53:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1643) Unnecessary fetching with http.content.limit when using protocol-http - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 16:44:14 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1618) Turn speculative execution off for Fetching - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 16:48:14 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-1618) Turn speculative execution off for Fetching - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 16:48:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1618) Turn speculative execution off for Fetching - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/03 17:29:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1769) API refactoring - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/03 18:09:16 UTC, 2 replies.
- [jira] [Resolved] (NUTCH-1182) fetcher to log hung threads - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/04 22:21:16 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1618) Turn speculative execution off for Fetching - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/05/05 00:11:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1182) fetcher to log hung threads - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/05/05 00:11:19 UTC, 1 replies.
- some questions about nutch? - posted by Li Li <fa...@gmail.com> on 2014/05/05 09:53:25 UTC, 0 replies.
- Post process Nutch data - posted by Srikanth Shankara Rao <sr...@aditi.com> on 2014/05/05 15:02:22 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1679) UpdateDb using batchId, link may override crawled page. - posted by "Ralf (JIRA)" <ji...@apache.org> on 2014/05/10 23:55:15 UTC, 3 replies.
- [jira] [Updated] (NUTCH-926) Redirections from META tag don't get filtered - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/10 23:58:46 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1570) Add filtering capability to Datastore Queries - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/10 23:59:25 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/11 00:00:48 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/11 00:00:49 UTC, 2 replies.
- [jira] [Updated] (NUTCH-207) Bandwidth target for fetcher rather than a thread count - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/11 00:06:23 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1770) Nutch is failing to parse all PDFs - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/11 00:15:59 UTC, 0 replies.
- Re: [jira] [Commented] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by Diaa Abdallah <di...@gmail.com> on 2014/05/11 17:31:54 UTC, 1 replies.
- [jira] [Closed] (NUTCH-1764) readdb to show command-line help if no action (-stats, -dump, etc.) given - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/11 17:36:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/11 17:38:15 UTC, 1 replies.
- [jira] [Assigned] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 09:59:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 10:01:33 UTC, 0 replies.
- [jira] [Created] (NUTCH-1771) Solrindex fails if a segment is corrupted or incomplete - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/12 11:55:15 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1766) Generator to unlock crawldb and remove tempdir if generate job fails - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/12 13:38:15 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 14:43:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1613) Timeouts in protocol-httpclient when crawling same host with >2 threads and added cookie strings for both http protocols - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 14:55:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1613) Timeouts in protocol-httpclient when crawling same host with >2 threads and added cookie strings for both http protocols - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 15:00:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1752) cache robots.txt rules per protocol:host:port - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 15:06:15 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1669) FTP crawl does not use FTP's server root folder - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 15:12:14 UTC, 0 replies.
- [jira] [Created] (NUTCH-1772) Injector does not need merging if no pre-existing crawldb - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 17:31:16 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1772) Injector does not need merging if no pre-existing crawldb - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/12 17:33:21 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1752) cache robots.txt rules per protocol:host:port - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/12 21:41:19 UTC, 0 replies.
- Clean up in case of error is not handled - posted by Diaa Abdallah <di...@gmail.com> on 2014/05/13 00:08:42 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1772) Injector does not need merging if no pre-existing crawldb - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/13 00:30:17 UTC, 5 replies.
- [jira] [Created] (NUTCH-1773) Solr Indexer fails - posted by "Ralf (JIRA)" <ji...@apache.org> on 2014/05/13 03:04:04 UTC, 0 replies.
- [jira] [Created] (NUTCH-1774) Crawling from REST API giving NullPointerException - posted by "sreemanth pulagam (JIRA)" <ji...@apache.org> on 2014/05/13 08:22:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1774) Crawling from REST API giving NullPointerException - posted by "sreemanth pulagam (JIRA)" <ji...@apache.org> on 2014/05/13 08:44:14 UTC, 2 replies.
- [jira] [Commented] (NUTCH-1771) Solrindex fails if a segment is corrupted or incomplete - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/13 11:28:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/13 11:54:17 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1586) Non-db_success records should have interval.max - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/13 11:54:17 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1774) Crawling from REST API giving NullPointerException - posted by "Fjodor Vershinin (JIRA)" <ji...@apache.org> on 2014/05/13 14:26:15 UTC, 3 replies.
- [jira] [Created] (NUTCH-1775) IndexingFilter: document origin of passed CrawlDatum - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/13 23:43:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1775) IndexingFilter: document origin of passed CrawlDatum - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/13 23:43:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1776) Log incorrect plugin.folder file path - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/15 01:34:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1776) Log incorrect plugin.folder file path - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/15 01:36:19 UTC, 4 replies.
- [jira] [Commented] (NUTCH-1773) Solr Indexer fails - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/15 02:50:20 UTC, 0 replies.
- Inject auto generated urls - posted by Diaa Abdallah <di...@gmail.com> on 2014/05/16 09:45:14 UTC, 1 replies.
- [jira] [Created] (NUTCH-1780) ttl and gc_grace_seconds attributes are missing from gora-cassandra-mapping.xml file - posted by "kaveh minooie (JIRA)" <ji...@apache.org> on 2014/05/16 12:57:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1780) ttl and gc_grace_seconds attributes are missing from gora-cassandra-mapping.xml file - posted by "kaveh minooie (JIRA)" <ji...@apache.org> on 2014/05/16 12:57:18 UTC, 4 replies.
- [jira] [Created] (NUTCH-1781) Update gora-*-mapping.xml and gora.proeprties to reflect Gora 0.4 - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/16 13:08:12 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1772) Injector does not need merging if no pre-existing crawldb - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/16 13:19:42 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1776) Log incorrect plugin.folder file path - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/16 13:22:28 UTC, 3 replies.
- [jira] [Resolved] (NUTCH-1714) Nutch 2.x upgrade to Gora 0.4 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/16 13:22:59 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/16 13:23:01 UTC, 0 replies.
- [jira] [Created] (NUTCH-1778) Generator not logging number of URLs in batch correctly - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/16 13:23:02 UTC, 0 replies.
- [jira] [Created] (NUTCH-1777) Fetcher not getting all the entries in input - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/16 13:23:02 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1605) mime type detector recognizes xlsx as zip file - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/16 13:23:06 UTC, 0 replies.
- [jira] [Created] (NUTCH-1779) Apply formatting to the code - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/16 13:23:08 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1613) Timeouts in protocol-httpclient when crawling same host with >2 threads and added cookie strings for both http protocols - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/16 13:23:50 UTC, 2 replies.
- [jira] [Updated] (NUTCH-1718) update description of property http.robots.agent - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/16 13:25:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1718) redefine http.robots.agent as "additional agent names" - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/16 13:25:16 UTC, 1 replies.
- [jira] [Resolved] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/16 15:44:22 UTC, 0 replies.
- [jira] [Created] (NUTCH-1782) NodeWalker to return current node - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/16 17:23:23 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1782) NodeWalker to return current node - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/16 17:25:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/16 17:34:31 UTC, 2 replies.
- [jira] [Created] (NUTCH-1783) Cleanup temp folders in case of failures - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/16 22:02:17 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1783) Cleanup temp folders in case of failures - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/16 22:04:23 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1780) ttl and gc_grace_seconds attributes are missing from gora-cassandra-mapping.xml file - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/17 02:40:15 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1780) ttl and gc_grace_seconds attributes are missing from gora-cassandra-mapping.xml file - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/17 02:40:15 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1779) Apply formatting to the code - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/17 02:54:14 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1779) Apply formatting to the code - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/17 02:54:15 UTC, 0 replies.
- [jira] [Created] (NUTCH-1784) CLONE - modifiedTime and prevmodifiedTime never set - posted by "hanchi (JIRA)" <ji...@apache.org> on 2014/05/17 13:53:14 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1784) CLONE - modifiedTime and prevmodifiedTime never set - posted by "hanchi (JIRA)" <ji...@apache.org> on 2014/05/17 13:55:14 UTC, 0 replies.
- [jira] [Reopened] (NUTCH-1784) CLONE - modifiedTime and prevmodifiedTime never set - posted by "hanchi (JIRA)" <ji...@apache.org> on 2014/05/17 13:57:14 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1784) modifiedTime and prevmodifiedTime never set - posted by "hanchi (JIRA)" <ji...@apache.org> on 2014/05/17 13:57:14 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1784) modifiedTime and prevmodifiedTime never set - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/17 19:14:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1774) Crawling from REST API giving NullPointerException - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/17 20:36:15 UTC, 0 replies.
- [jira] [Closed] (NUTCH-1774) Crawling from REST API giving NullPointerException - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/17 20:36:15 UTC, 0 replies.
- [jira] [Assigned] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/17 20:38:14 UTC, 0 replies.
- Creating Windows bash files for nutch - posted by Diaa Abdallah <di...@gmail.com> on 2014/05/17 23:33:58 UTC, 3 replies.
- Build failed in Jenkins: Nutch-nutchgora #1015 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/18 05:37:06 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1780) ttl and gc_grace_seconds attributes are missing from gora-cassandra-mapping.xml file - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/05/18 05:37:15 UTC, 0 replies.
- Jenkins build is back to normal : Nutch-trunk #2630 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/18 05:44:15 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1784) modifiedTime and prevmodifiedTime never set - posted by "hanchi (JIRA)" <ji...@apache.org> on 2014/05/18 07:59:14 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1773) Solr Indexer fails - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 16:07:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1704) Port DomainBlacklist urlfilter to 2.x - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:17:40 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1634) readdb -stats show the result twice - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:21:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1765) SolrClean to remove redirected URLs from Solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:23:39 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1765) SolrClean to remove redirected URLs from Solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:25:43 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1757) ParserChecker to take custom metadata as input - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:27:38 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1758) IndexChecker to send document to IndexWriters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:27:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1757) ParserChecker to take custom metadata as input - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:27:38 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1758) IndexChecker to send document to IndexWriters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:27:39 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1678) Remove dependency on org.apache.oro - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:29:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1722) FetcherJob#fetch throws NullPointerException for null batchId - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:33:38 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1068) Automaton performance improvements based on Lucene code base - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:39:38 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1526) Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:39:39 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1526) Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 22:45:39 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 23:07:38 UTC, 1 replies.
- [jira] [Updated] (NUTCH-1746) OutOfMemoryError in Mappers - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/05/21 04:56:41 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1486) Upgrade to the latest Solr 4.x - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/21 14:41:38 UTC, 0 replies.
- [jira] [Created] (NUTCH-1785) Ability to index raw content - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/21 16:54:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1785) Ability to index raw content - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/21 16:56:37 UTC, 3 replies.
- Nutch survey - posted by Julien Nioche <li...@gmail.com> on 2014/05/21 17:07:47 UTC, 5 replies.
- [jira] [Resolved] (NUTCH-1757) ParserChecker to take custom metadata as input - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/21 21:52:39 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1785) Ability to index raw content - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/21 21:56:39 UTC, 5 replies.
- [jira] [Created] (NUTCH-1786) CrawlDb should follow db.url.normalizers and db.url.filters - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/21 23:38:39 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1786) CrawlDb should follow db.url.normalizers and db.url.filters - posted by "Diaa (JIRA)" <ji...@apache.org> on 2014/05/21 23:40:38 UTC, 1 replies.
- [jira] [Commented] (NUTCH-1786) CrawlDb should follow db.url.normalizers and db.url.filters - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/22 13:58:03 UTC, 3 replies.
- Nutch readings for developers - posted by Frédéric Passaniti <f....@gmail.com> on 2014/05/22 16:34:11 UTC, 0 replies.
- Why is fetcher one big class? - posted by Diaa Abdallah <di...@gmail.com> on 2014/05/22 23:43:19 UTC, 1 replies.
- Are these settings/behaviors really required to maintain when porting httpclient to 4.3.3? - posted by d_k <ma...@gmail.com> on 2014/05/23 12:51:05 UTC, 0 replies.
- [jira] [Created] (NUTCH-1787) update and complete API doc overview page - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/25 17:39:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1787) update and complete API doc overview page - posted by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/25 17:41:01 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1016 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/26 06:06:01 UTC, 0 replies.
- [jira] [Created] (NUTCH-1788) Tika may return multiple values for Title on PDF's - posted by "Talat UYARER (JIRA)" <ji...@apache.org> on 2014/05/26 06:08:01 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1561) improve usability of parse-metatags and index-metadata - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/26 09:08:01 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1786) CrawlDb should follow db.url.normalizers and db.url.filters - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/05/26 12:48:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1017 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/27 06:08:10 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1018 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/28 07:56:35 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1019 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/29 06:36:39 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1758) IndexChecker to send document to IndexWriters - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/29 12:22:03 UTC, 0 replies.
- [jira] [Created] (NUTCH-1789) Migrate Nutch site to Apache CMS - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/05/30 01:55:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1020 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/30 07:31:05 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1634) readdb -stats show the result twice - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/30 16:51:04 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1768) Upgrade to ElasticSearch 1.1.0 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/30 16:55:04 UTC, 0 replies.
- [jira] [Resolved] (NUTCH-1768) Upgrade to ElasticSearch 1.1.0 - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/30 16:59:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1021 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/30 17:41:13 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1634) readdb -stats show the result twice - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/05/30 17:43:01 UTC, 0 replies.
- [jira] [Commented] (NUTCH-1768) Upgrade to ElasticSearch 1.1.0 - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/05/30 17:43:02 UTC, 0 replies.
- Build failed in Jenkins: Nutch-nutchgora #1022 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/05/31 06:07:16 UTC, 0 replies.
- [jira] [Created] (NUTCH-1790) solrdedup in local mode causes OutOfMemoryError in Solr - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/05/31 18:14:01 UTC, 0 replies.
- [jira] [Updated] (NUTCH-1790) solrdedup in local mode causes OutOfMemoryError in Solr - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/05/31 18:16:02 UTC, 4 replies.
- [jira] [Updated] (NUTCH-1790) solrdedup causes OutOfMemoryError in Solr - posted by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/05/31 18:20:01 UTC, 5 replies.
- [jira] [Resolved] (NUTCH-1790) solrdedup causes OutOfMemoryError in Solr - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/31 20:27:01 UTC, 0 replies.