You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Mendelson, Assaf" <As...@rsa.com> on 2017/05/28 09:21:55 UTC

[WARN] org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

Hi,
I am getting the following warning:

[WARN] org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

This seems to occur every time I try to read from parquet a nested data structure.

Example to reproduce:


case class A(v: Int)
case class B(v: A)
val filename = "test"
val a = A(1)
val b = B(a)
val df1: DataFrame = Seq[B](b).toDF
df1.write.parquet(filename)
val df2 = spark.read.parquet(filename)
df2.show()


I also found this https://issues.apache.org/jira/browse/SPARK-18660 but it has no info and no resolution (it does point to the issue being fixed in parquet 1.10 while spark still uses parquet 1.8)
I currently added the following to log4j.properties:
log4j.logger.org.apache.parquet.hadoop.ParquetRecordReader=ERROR

however, this seems like a poor solution to me.
Any decent way to solve this?


Thanks,
              Assaf.