You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Dmitry Kulakov (Jira)" <ji...@apache.org> on 2019/08/20 13:35:00 UTC
[jira] [Created] (TIKA-2927) XSSFExcelExtractorDecorator emits
non-existent empty rows.
Dmitry Kulakov created TIKA-2927:
------------------------------------
Summary: XSSFExcelExtractorDecorator emits non-existent empty rows.
Key: TIKA-2927
URL: https://issues.apache.org/jira/browse/TIKA-2927
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.22, 1.21, 1.20
Reporter: Dmitry Kulakov
Parsing xlsx files with the _includeMissingRows_ set to true in the _OfficeParserConfig_ causes the _XSSFExcelExtractorDecorator_ to emit extra empty rows equal to the current row number - 1. The issue is that the _lastSeenRow_ is never updated, so every new row is treated as the first non-empty row. Easy fix which requires the _lastSeenRow_ to be updated after the start of every new row. I will add the fix along with the relevant unit test in a pull request.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)