You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2022/10/13 21:35:00 UTC

[jira] [Commented] (TIKA-3878) Improve PipesReporter and PipesIterator to report the total number of files to be processed

    [ https://issues.apache.org/jira/browse/TIKA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617309#comment-17617309 ] 

Hudson commented on TIKA-3878:
------------------------------

SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk8 #843 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/843/])
TIKA-3878 -- allow pipes iterators to count the total number of files. (tallison: [https://github.com/apache/tika/commit/339289e45eae6560155f0fb7631687cfc86ba610])
* (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/test/java/org/apache/tika/pipes/reporters/fs/TestFileSystemStatusReporter.java
* (edit) tika-parent/pom.xml
* (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/pom.xml
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesReporter.java
* (add) tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/TotalCounter.java
* (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/main/java/org/apache/tika/pipes/reporters/fs/FileSystemStatusReporter.java
* (add) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncStatus.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/fs/FileSystemPipesIterator.java
* (add) tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/TotalCountResult.java


> Improve PipesReporter and PipesIterator to report the total number of files to be processed
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3878
>                 URL: https://issues.apache.org/jira/browse/TIKA-3878
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Tim Allison
>            Priority: Major
>
> For user-facing applications, it would be useful to give them a sense of progress in reporting with a denominator (total files to process). 
> Some pipesiterators will have a natural shortcut (select count(1)... for jdbc or other queries in OpenSearch and/or Solr).  Some will have to do twice the work -- file system and s3(?).  And some simply won't be able to report a total number.
> My initial target is the FileSystemPipesIterator and the FileSystemStatusReporter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)