You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2017/01/21 20:11:26 UTC

[jira] [Created] (SOLR-10017) Add the crawl Streaming Expression

Joel Bernstein created SOLR-10017:
-------------------------------------

             Summary: Add the crawl Streaming Expression
                 Key: SOLR-10017
                 URL: https://issues.apache.org/jira/browse/SOLR-10017
             Project: Solr
          Issue Type: New Feature
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Joel Bernstein


The crawl Streaming Expression will wrap a stream that emits root URL's to crawl. It will then crawl the URL's using a library such as Crawl4j. It will emit tuples that can be indexed into a Solr Cloud collection using the update function. Solr's classifier can be used to curate content as it's being crawled or classify sites based on the content which it contains. The links between pages and sites can be indexed as graphs and then explored and visualized with graph expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org