You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Dmitry Kulakov (Jira)" <ji...@apache.org> on 2019/08/20 13:35:00 UTC

[jira] [Created] (TIKA-2927) XSSFExcelExtractorDecorator emits non-existent empty rows.

Dmitry Kulakov created TIKA-2927:
------------------------------------

             Summary: XSSFExcelExtractorDecorator emits non-existent empty rows.
                 Key: TIKA-2927
                 URL: https://issues.apache.org/jira/browse/TIKA-2927
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.22, 1.21, 1.20
            Reporter: Dmitry Kulakov


Parsing xlsx files with the _includeMissingRows_ set to true in the _OfficeParserConfig_ causes the _XSSFExcelExtractorDecorator_ to emit extra empty rows equal to the current row number - 1. The issue is that the _lastSeenRow_ is never updated, so every new row is treated as the first non-empty row. Easy fix which requires the _lastSeenRow_ to be updated after the start of every new row. I will add the fix along with the relevant unit test in a pull request.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)