You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Emil Zegers (Jira)" <ji...@apache.org> on 2024/04/26 13:19:00 UTC

[jira] [Created] (TIKA-4246) tika-pipes FileSystemFetcher configuration option for file name/path pattern selection

Emil Zegers created TIKA-4246:
---------------------------------

             Summary: tika-pipes FileSystemFetcher configuration option for file name/path pattern selection
                 Key: TIKA-4246
                 URL: https://issues.apache.org/jira/browse/TIKA-4246
             Project: Tika
          Issue Type: New Feature
          Components: tika-pipes
            Reporter: Emil Zegers


Would be useful to have the possibility to configure FileSystemFetcher for tika-pipes to only process certain files, e.g. based on extension, match on file name/path or similar pattern.
 
This way it would be possible to point to a specific root folder and only process matching files like certain extensions, names (e.g. for GIS files like shapefiles there is same name with multiple extensions) etc.
 
Something like:
 
<properties>
  <fetchers>
    <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
      <params>
        <name>fsf</name>
        <basePath>/my/base/path1</basePath>
        <pattern>myshapefilename.*</pattern>
      </params>
    </fetcher>
  </fetchers>
</properties>
 
Or:
 
        <pattern>*.doc*,*.pdf</pattern>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)