You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2017/01/21 20:11:26 UTC
[jira] [Created] (SOLR-10017) Add the crawl Streaming Expression
Joel Bernstein created SOLR-10017:
-------------------------------------
Summary: Add the crawl Streaming Expression
Key: SOLR-10017
URL: https://issues.apache.org/jira/browse/SOLR-10017
Project: Solr
Issue Type: New Feature
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein
The crawl Streaming Expression will wrap a stream that emits root URL's to crawl. It will then crawl the URL's using a library such as Crawl4j. It will emit tuples that can be indexed into a Solr Cloud collection using the update function. Solr's classifier can be used to curate content as it's being crawled or classify sites based on the content which it contains. The links between pages and sites can be indexed as graphs and then explored and visualized with graph expressions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org