You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ala Luszczak (Jira)" <ji...@apache.org> on 2022/06/29 16:11:00 UTC

[jira] [Created] (SPARK-39634) Allow file splitting in combination with row index generation

|  ![](cid:jira-generated-image-avatar-4bf1ec45-c90e-4814-ab54-798ef9ded98a) |
[Ala
Luszczak](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ala.luszczak)
**created** an issue  
---|---  
|  
---  
|  [Spark](https://issues.apache.org/jira/browse/SPARK) /
[![Improvement](cid:jira-generated-image-
avatar-16264e02-316c-4db7-84a1-85a99f9014d7)](https://issues.apache.org/jira/browse/SPARK-39634)
[SPARK-39634](https://issues.apache.org/jira/browse/SPARK-39634)  
---  
[Allow file splitting in combination with row index
generation](https://issues.apache.org/jira/browse/SPARK-39634)  
| Issue Type: |  ![Improvement](cid:jira-generated-image-
avatar-16264e02-316c-4db7-84a1-85a99f9014d7) Improvement  
---|---  
Affects Versions: |  3.3.0  
Assignee: |  Unassigned  
Components: |  SQL  
Created: |  29/Jun/22 16:10  
Priority: |  ![Major](cid:jira-generated-image-static-
major-58c30f13-2c3e-4130-b557-6cf0235f4b47) Major  
Reporter: |  [Ala
Luszczak](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ala.luszczak)  
|

This issue is a follow up for
[SPARK-37980](https://issues.apache.org/jira/browse/SPARK-37980 "Extend
METADATA column to support row indices for file based data sources")

Because of a bug in parquet-mr
<https://issues.apache.org/jira/browse/PARQUET-2161> it is currently
impossible to generate row indexes for parquet files if they are split into
multiple pieces. Instead, each file must be read in a single task.

Once the version of parquet-mr with the fix is included in Spark, we should
remove the workarounds from the code (marked with this ticket number) from the
code, so that parquet files are splittable even when the row indexes need to
be generated.  
  
---  
|  |  [ ![Add Comment](cid:jira-generated-image-static-comment-
icon-25265668-3122-45ef-905c-3166a974328b)
](https://issues.apache.org/jira/browse/SPARK-39634#add-comment "Add Comment")
|  [Add Comment](https://issues.apache.org/jira/browse/SPARK-39634#add-comment
"Add Comment")  
---|---  
  
|  This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) |  |
![Atlassian logo](https://issues.apache.org/jira/images/mail/atlassian-email-
logo.png)  
---