You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Thomas Friedrich (JIRA)" <ji...@apache.org> on 2015/07/01 22:26:05 UTC

[jira] [Created] (PARQUET-324) row count incorrect if data file has more than 2^31 rows

Thomas Friedrich created PARQUET-324:
----------------------------------------

             Summary: row count incorrect if data file has more than 2^31 rows
                 Key: PARQUET-324
                 URL: https://issues.apache.org/jira/browse/PARQUET-324
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.7.0, 1.8.0
            Reporter: Thomas Friedrich
            Priority: Minor


If a parquet file has more than 2^31 rows, the row count written into the file metadata is incorrect. 
The cause of the problem is the use of an int instead of long data type for numRows in ParquetMetadataConverter, toParquetMetadata:
    int numRows = 0;
    for (BlockMetaData block : blocks) {
      numRows += block.getRowCount();
      addRowGroup(parquetMetadata, rowGroups, block);
    }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)