You are viewing a plain text version of this content. The canonical link for it is here.
Posted to droids-dev@incubator.apache.org by "Richard Frovarp (JIRA)" <ji...@apache.org> on 2012/06/15 02:46:43 UTC

[jira] [Updated] (DROIDS-142) Add additional filtering untill the file is saved on disk

     [ https://issues.apache.org/jira/browse/DROIDS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Frovarp updated DROIDS-142:
-----------------------------------

    Fix Version/s:     (was: 0.2.0)
                   0.3.0
    
> Add additional filtering untill the file is saved on disk
> ---------------------------------------------------------
>
>                 Key: DROIDS-142
>                 URL: https://issues.apache.org/jira/browse/DROIDS-142
>             Project: Droids
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.2.0
>            Reporter: Eugen Paraschiv
>             Fix For: 0.3.0
>
>
> The existing filtering process allows URLs to be accepted based on the URL itself, which is very useful. There are some cases though where you need to decide if the file is relevant and should be saved (or not) based on the content itself. 
> There should be a step in SaveHandler before the file is actually saved, where the handler should be able to decide if the file is to be persisted or ignored based on the URL but also on the file contents itself. It is here that specific checks should be introduced to further filter out the files. 
> - note: as an example of this, consider the very common site that doesn't really have hierarchical, well defined URLs, but instead simple /domain/object1, /domain/object2 type URLs; this links don't really say anything about the content, so filtering them out by a regex would do no good; the page itself however is likely to contain all the information required to have a more granular filtering in place

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira