You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2019/04/16 06:44:00 UTC

[jira] [Commented] (PARQUET-1563) cannot read 'date' datatype which write by spark

    [ https://issues.apache.org/jira/browse/PARQUET-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818693#comment-16818693 ] 

Yuming Wang commented on PARQUET-1563:
--------------------------------------

It's not a bug. you need convert it to date:

https://github.com/apache/spark/blob/21a7bfd5c324e6c82152229f1394f26afeae771c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala#L145-L147

> cannot read 'date' datatype which write by spark
> ------------------------------------------------
>
>                 Key: PARQUET-1563
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1563
>             Project: Parquet
>          Issue Type: Bug
>         Environment: jdk: 1.8
> macOS Mojave 10.14.4
>            Reporter: Fan Mo
>            Priority: Major
>
> I'm using spark 2.4.0 to write parquet file and try to use parquet-column-1.10.jar to read the data. All the primary datatypes are working however for the date datatype it gets some meanless number.  For example, input date is '1970-04-26', output data is '115'. if I use Spark to read the data, it can get the correct date. 
> following are my reader code:
> val reader = ParquetFileReader.open(HadoopInputFile.fromPath(new Path(("testfile.snappy.parquet")), new Configuration()))
> val schema = reader.getFooter.getFileMetaData.getSchema
> var pages : PageReadStore = null
> while((pages = reader.readNextRowGroup()) != null) {
>  val rows = pages.getRowCount
>  val columnIO = new ColumnIOFactory().getColumnIO(schema)
>  val recordReader = columnIO.getRecordReader(pages,new GroupRecordConverter(schema))
>  (0L until rows).foreach{ _ : Long =>
>  val simpleGroup = recordReader.read()
>  println(simpleGroup)
>  }
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)