You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Viktor Khristenko (JIRA)" <ji...@apache.org> on 2017/05/04 05:49:04 UTC

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

     [ https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viktor Khristenko updated SPARK-20593:
--------------------------------------
    Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an improper way.

 I have a dataset:

root
 muons: array (nullable = true)
   element: struct (containsNull = true)
    reco::Candidate: struct (nullable = true)
      qx3_: integer (nullable = true)
      pt_: float (nullable = true)
      eta_: float (nullable = true)
      phi_: float (nullable = true)
      mass_: float (nullable = true)
      vertex_: struct (nullable = true)
      fCoordinates: struct (nullable = true)
      fX: float (nullable = true)
      fY: float (nullable = true)
      fZ: float (nullable = true)
      pdgId_: integer (nullable = true)
      status_: integer (nullable = true)
      cachePolarFixed_: struct (nullable = true)
      cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I can read/manipulate/do whatever. However, when I try writing to disk in parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended behavior??? I also assume that it's related to the empty structs. Any help would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an improper way.

 I have a dataset:

root
|-- muons: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- reco::Candidate: struct (nullable = true)
|    |    |-- qx3_: integer (nullable = true)
|    |    |-- pt_: float (nullable = true)
|    |    |-- eta_: float (nullable = true)
|    |    |-- phi_: float (nullable = true)
|    |    |-- mass_: float (nullable = true)
|    |    |-- vertex_: struct (nullable = true)
|    |    |    |-- fCoordinates: struct (nullable = true)
|    |    |    |    |-- fX: float (nullable = true)
|    |    |    |    |-- fY: float (nullable = true)
|    |    |    |    |-- fZ: float (nullable = true)
|    |    |-- pdgId_: integer (nullable = true)
|    |    |-- status_: integer (nullable = true)
|    |    |-- cachePolarFixed_: struct (nullable = true)
|    |    |-- cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I can read/manipulate/do whatever. However, when I try writing to disk in parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended behavior??? I also assume that it's related to the empty structs. Any help would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group


> Writing Parquet: Cannot build an empty group
> --------------------------------------------
>
>                 Key: SPARK-20593
>                 URL: https://issues.apache.org/jira/browse/SPARK-20593
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core, Spark Shell
>    Affects Versions: 2.1.1
>         Environment: I use Apache Spark 2.1.1 (used 2.1.0 and it was the same, switched today). Tested only Mac
>            Reporter: Viktor Khristenko
>            Priority: Minor
>
> Hi,
> This is my first ticket and I apologize for/if I'm doing certain things in an improper way.
>  I have a dataset:
> root
>  muons: array (nullable = true)
>    element: struct (containsNull = true)
>     reco::Candidate: struct (nullable = true)
>       qx3_: integer (nullable = true)
>       pt_: float (nullable = true)
>       eta_: float (nullable = true)
>       phi_: float (nullable = true)
>       mass_: float (nullable = true)
>       vertex_: struct (nullable = true)
>       fCoordinates: struct (nullable = true)
>       fX: float (nullable = true)
>       fY: float (nullable = true)
>       fZ: float (nullable = true)
>       pdgId_: integer (nullable = true)
>       status_: integer (nullable = true)
>       cachePolarFixed_: struct (nullable = true)
>       cacheCartesianFixed_: struct (nullable = true)
> As you can see, there are 3 empty structs in this schema. I know 100% that I can read/manipulate/do whatever. However, when I try writing to disk in parquet, I get the following Exception:
> ds.write.format("parquet").save(outputPathName):
> java.lang.IllegalStateException: Cannot build an empty group
> at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
> at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
> at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
> at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
> So, basically I would like to understand if it's a bug or an intended behavior??? I also assume that it's related to the empty structs. Any help would be really appreciated!
> I've quickly created stripped version and that one works without any issues!
> For reference, I put a link to a original question on SO[1]
> VK
> [1] http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org