You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2011/06/23 16:17:47 UTC

[jira] [Commented] (CONNECTORS-214) Add post-extraction inclusions and exclusions into the web connector

    [ https://issues.apache.org/jira/browse/CONNECTORS-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053874#comment-13053874 ] 

Karl Wright commented on CONNECTORS-214:
----------------------------------------

Functionally, it should be easy to add another set of inclusions/exclusions to support this case.  But where should it appear in the UI?  I'm leaning towards adding these as additional fields on the Inclusions and Exclusions tab.  There are already too many tabs for the Web connector to add two more of them.  Any other ideas?


> Add post-extraction inclusions and exclusions into the web connector
> --------------------------------------------------------------------
>
>                 Key: CONNECTORS-214
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-214
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>    Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>            Reporter: Erlend GarĂ¥sen
>            Assignee: Erlend GarĂ¥sen
>             Fix For: ManifoldCF next
>
>
> If html files are excluded for a job, links in these files will not be followed. If we add inclusion and exclusion filters based on post-extraction, it will be possible to fetch only certain types of documents, such as PDFs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira