You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Zahid Rahman <za...@gmail.com> on 2020/03/26 08:11:42 UTC

Programmatic: parquet file corruption error

Hi,

When I run the code for a user defined data type dataset using case class
in scala  and run the code in the interactive spark-shell against parquet
file. The results are as expected.
However I then the same code programmatically in IntelliJ IDE then spark is
give a file corruption error.

Steps I have taken to determine the source of error are :
I have tested for file permission and made sure to chmod 777 , just in
case.
I tried a fresh copy of same parquet file.
I ran both programme before and after the fresh copy.
I also rebooted then ran programmatically against a fresh parquet file.
The corruption error was consistent in all cases.
I have copy and pasted the spark-shell , the error message and the code in
the IDE and the pom.xml, IntelliJ java  classpath command line.

Perhaps the code in the libraries are different than the ones  used by
spark-shell from that when run programmatically.
I don't believe it is an error on my part.
<------------------------------------------------------------------------------------------------------

07:28:45 WARN  CorruptStatistics:117 - Ignoring statistics because
created_by could not be parsed (see PARQUET-251): parquet-mr (build
32c46643845ea8a705c35d4ec8fc654cc8ff816d)
org.apache.parquet.VersionParser$VersionParseException: Could not parse
created_by: parquet-mr (build 32c46643845ea8a705c35d4ec8fc654cc8ff816d)
using format:
(.*?)\s+version\s*(?:([^(]*?)\s*(?:\(\s*build\s*([^)]*?)\s*\))?)?
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at
org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:72)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatisticsInternal(ParquetMetadataConverter.java:435)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:454)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:914)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:885)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448)
at
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:105)
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:131)
at
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.buildReaderBase(ParquetPartitionReaderFactory.scala:174)
at
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.createVectorizedReader(ParquetPartitionReaderFactory.scala:205)
at
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.buildColumnarReader(ParquetPartitionReaderFactory.scala:103)
at
org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createColumnarReader$1(FilePartitionReaderFactory.scala:38)
at
org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory$$Lambda$2018/0000000000000000.apply(Unknown
Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at
org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:109)
at
org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:42)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:62)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:726)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:321)
at
org.apache.spark.sql.execution.SparkPlan$$Lambda$1879/0000000000000000.apply(Unknown
Source)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
at org.apache.spark.rdd.RDD$$Lambda$1875/0000000000000000.apply(Unknown
Source)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:441)
at
org.apache.spark.executor.Executor$TaskRunner$$Lambda$855/0000000000000000.apply(Unknown
Source)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:444)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:821)


<--------------------------------------------------------------------------------------------------------------
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.0-preview2
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
Type in expressions to have them evaluated.
Type :help for more information.

scala>       val flightDf =
spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
ORIGIN_COUNTRY_NAME: string ... 1 more field]

scala> case class flight (DEST_COUNTRY_NAME: String,
     |                      ORIGIN_COUNTRY_NAME:String,
     |                      count: BigInt)
defined class flight

scala>      val flightDf =
spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
ORIGIN_COUNTRY_NAME: string ... 1 more field]

scala>     val flights = flightDf.as[flight]
flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME: string,
ORIGIN_COUNTRY_NAME: string ... 1 more field]

scala>   flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
"Canada").map(flight_row => flight_row).take(3)
res0: Array[flight] = Array(flight(United States,Romania,1), flight(United
States,Ireland,264), flight(United States,India,69))

<
---------------------------------------------------------------------------------------------------------------------

import org.apache.spark.sql.SparkSession

object chapter2 {

  // define specific  data type class then manipulate it using the
filter and map functions
  case class flight (DEST_COUNTRY_NAME: String,
                     ORIGIN_COUNTRY_NAME:String,
                     count: BigInt)

    def main(args: Array[String]): Unit = {

      // using an inter active shell, spark session needed here to
avoid Intellij errors
      val spark = SparkSession.builder.master("local[*]").appName("
chapter2").getOrCreate

      import spark.implicits._
      val flightDf =
spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
      val flights = flightDf.as[flight]
      flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
"Canada").map(flight_row => flight_row).take(3)
      spark.stop()
    }
}

<---------------------------------------------------------------------------------------------------------------------------------

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.12</artifactId>
    <version>3.0.0-preview2</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.12</artifactId>
    <version>3.0.0-preview2</version>
</dependency>

<!---------------------Intellij command line
------------------------------------->
/home/kub19/jdk8u242-b08/bin/java
-javaagent:/snap/intellij-idea-community/216/lib/idea_rt.jar=42207:/snap/intellij-idea-community/216/bin
-Dfile.encoding=UTF-8 -classpath
/home/kub19/jdk8u242-b08/jre/lib/charsets.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/cldrdata.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dnsns.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dtfj.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dtfjview.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/jaccess.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/localedata.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/nashorn.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunec.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunjce_provider.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunpkcs11.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/traceformat.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/zipfs.jar:/home/kub19/jdk8u242-b08/jre/lib/jce.jar:/home/kub19/jdk8u242-b08/jre/lib/jsse.jar:/home/kub19/jdk8u242-b08/jre/lib/management-agent.jar:/home/kub19/jdk8u242-b08/jre/lib/resources.jar:/home/kub19/jdk8u242-b08/jre/lib/rt.jar:/home/kub19/spark-2.4.5-bin-hadoop2.7/projects/SparkDefinitiveGuide/target/classes:/home/kub19/.m2/repository/org/apache/spark/spark-core_2.12/3.0.0-preview2/spark-core_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/home/kub19/.m2/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/kub19/.m2/repository/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/home/kub19/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/kub19/.m2/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/home/kub19/.m2/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/home/kub19/.m2/repository/commons-codec/commons-codec/1.9/commons-codec-1.9.jar:/home/kub19/.m2/repository/com/twitter/chill_2.12/0.9.3/chill_2.12-0.9.3.jar:/home/kub19/.m2/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/home/kub19/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/kub19/.m2/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/home/kub19/.m2/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/home/kub19/.m2/repository/org/apache/xbean/xbean-asm7-shaded/4.15/xbean-asm7-shaded-4.15.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-client/2.7.4/hadoop-client-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-common/2.7.4/hadoop-common-2.7.4.jar:/home/kub19/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/kub19/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/kub19/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/kub19/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/kub19/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/kub19/.m2/repository/org/mortbay/jetty/jetty-sslengine/6.1.26/jetty-sslengine-6.1.26.jar:/home/kub19/.m2/repository/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar:/home/kub19/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/kub19/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/kub19/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/kub19/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-auth/2.7.4/hadoop-auth-2.7.4.jar:/home/kub19/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/home/kub19/.m2/repository/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/home/kub19/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/kub19/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/kub19/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/kub19/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/kub19/.m2/repository/org/apache/curator/curator-client/2.7.1/curator-client-2.7.1.jar:/home/kub19/.m2/repository/org/apache/htrace/htrace-core/3.1.0-incubating/htrace-core-3.1.0-incubating.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.7.4/hadoop-hdfs-2.7.4.jar:/home/kub19/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/kub19/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/kub19/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.7.4/hadoop-mapreduce-client-app-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.7.4/hadoop-mapreduce-client-common-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.7.4/hadoop-yarn-client-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.7.4/hadoop-yarn-server-common-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.7.4/hadoop-mapreduce-client-shuffle-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.7.4/hadoop-yarn-api-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.4/hadoop-mapreduce-client-core-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.7.4/hadoop-yarn-common-2.7.4.jar:/home/kub19/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/kub19/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.7.4/hadoop-mapreduce-client-jobclient-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-annotations/2.7.4/hadoop-annotations-2.7.4.jar:/home/kub19/.m2/repository/org/apache/spark/spark-launcher_2.12/3.0.0-preview2/spark-launcher_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-kvstore_2.12/3.0.0-preview2/spark-kvstore_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.10.0/jackson-core-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.10.0/jackson-annotations-2.10.0.jar:/home/kub19/.m2/repository/org/apache/spark/spark-network-common_2.12/3.0.0-preview2/spark-network-common_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-network-shuffle_2.12/3.0.0-preview2/spark-network-shuffle_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-unsafe_2.12/3.0.0-preview2/spark-unsafe_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/home/kub19/.m2/repository/org/apache/curator/curator-recipes/2.7.1/curator-recipes-2.7.1.jar:/home/kub19/.m2/repository/org/apache/curator/curator-framework/2.7.1/curator-framework-2.7.1.jar:/home/kub19/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/kub19/.m2/repository/org/apache/zookeeper/zookeeper/3.4.14/zookeeper-3.4.14.jar:/home/kub19/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/kub19/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/kub19/.m2/repository/org/apache/commons/commons-lang3/3.9/commons-lang3-3.9.jar:/home/kub19/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/kub19/.m2/repository/org/apache/commons/commons-text/1.6/commons-text-1.6.jar:/home/kub19/.m2/repository/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar:/home/kub19/.m2/repository/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar:/home/kub19/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/kub19/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/kub19/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/kub19/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/home/kub19/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/kub19/.m2/repository/org/xerial/snappy/snappy-java/
1.1.7.3/snappy-java-1.1.7.3.jar:/home/kub19/.m2/repository/org/lz4/lz4-java/1.7.0/lz4-java-1.7.0.jar:/home/kub19/.m2/repository/com/github/luben/zstd-jni/1.4.4-3/zstd-jni-1.4.4-3.jar:/home/kub19/.m2/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/home/kub19/.m2/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/home/kub19/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/kub19/.m2/repository/org/scala-lang/modules/scala-xml_2.12/1.2.0/scala-xml_2.12-1.2.0.jar:/home/kub19/.m2/repository/org/scala-lang/scala-library/2.12.10/scala-library-2.12.10.jar:/home/kub19/.m2/repository/org/scala-lang/scala-reflect/2.12.10/scala-reflect-2.12.10.jar:/home/kub19/.m2/repository/org/json4s/json4s-jackson_2.12/3.6.6/json4s-jackson_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-core_2.12/3.6.6/json4s-core_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-ast_2.12/3.6.6/json4s-ast_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-scalap_2.12/3.6.6/json4s-scalap_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-client/2.29.1/jersey-client-2.29.1.jar:/home/kub19/.m2/repository/jakarta/ws/rs/jakarta.ws.rs-api/2.1.6/jakarta.ws.rs-api-2.1.6.jar:/home/kub19/.m2/repository/org/glassfish/hk2/external/jakarta.inject/2.6.1/jakarta.inject-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-common/2.29.1/jersey-common-2.29.1.jar:/home/kub19/.m2/repository/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/home/kub19/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.3/osgi-resource-locator-1.0.3.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-server/2.29.1/jersey-server-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.29.1/jersey-media-jaxb-2.29.1.jar:/home/kub19/.m2/repository/jakarta/validation/jakarta.validation-api/2.0.2/jakarta.validation-api-2.0.2.jar:/home/kub19/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.29.1/jersey-container-servlet-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.29.1/jersey-container-servlet-core-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/inject/jersey-hk2/2.29.1/jersey-hk2-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-locator/2.6.1/hk2-locator-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.6.1/aopalliance-repackaged-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-api/2.6.1/hk2-api-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-utils/2.6.1/hk2-utils-2.6.1.jar:/home/kub19/.m2/repository/org/javassist/javassist/3.22.0-CR2/javassist-3.22.0-CR2.jar:/home/kub19/.m2/repository/io/netty/netty-all/4.1.42.Final/netty-all-4.1.42.Final.jar:/home/kub19/.m2/repository/com/clearspring/analytics/stream/2.9.6/stream-2.9.6.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-core/4.1.1/metrics-core-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-jvm/4.1.1/metrics-jvm-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-json/4.1.1/metrics-json-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-graphite/4.1.1/metrics-graphite-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-jmx/4.1.1/metrics-jmx-4.1.1.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.10.0/jackson-databind-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.12/2.10.0/jackson-module-scala_2.12-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.10.0/jackson-module-paranamer-2.10.0.jar:/home/kub19/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/kub19/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/kub19/.m2/repository/net/razorvine/pyrolite/4.30/pyrolite-4.30.jar:/home/kub19/.m2/repository/net/sf/py4j/py4j/0.10.8.1/py4j-0.10.8.1.jar:/home/kub19/.m2/repository/org/apache/spark/spark-tags_2.12/3.0.0-preview2/spark-tags_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/kub19/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/home/kub19/.m2/repository/org/apache/spark/spark-sql_2.12/3.0.0-preview2/spark-sql_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/com/univocity/univocity-parsers/2.8.3/univocity-parsers-2.8.3.jar:/home/kub19/.m2/repository/org/apache/spark/spark-sketch_2.12/3.0.0-preview2/spark-sketch_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-catalyst_2.12/3.0.0-preview2/spark-catalyst_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.12/1.1.2/scala-parser-combinators_2.12-1.1.2.jar:/home/kub19/.m2/repository/org/codehaus/janino/janino/3.0.15/janino-3.0.15.jar:/home/kub19/.m2/repository/org/codehaus/janino/commons-compiler/3.0.15/commons-compiler-3.0.15.jar:/home/kub19/.m2/repository/org/antlr/antlr4-runtime/4.7.1/antlr4-runtime-4.7.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-vector/0.15.1/arrow-vector-0.15.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-format/0.15.1/arrow-format-0.15.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-memory/0.15.1/arrow-memory-0.15.1.jar:/home/kub19/.m2/repository/com/google/flatbuffers/flatbuffers-java/1.9.0/flatbuffers-java-1.9.0.jar:/home/kub19/.m2/repository/org/apache/orc/orc-core/1.5.8/orc-core-1.5.8.jar:/home/kub19/.m2/repository/org/apache/orc/orc-shims/1.5.8/orc-shims-1.5.8.jar:/home/kub19/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/kub19/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/kub19/.m2/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/home/kub19/.m2/repository/org/apache/orc/orc-mapreduce/1.5.8/orc-mapreduce-1.5.8.jar:/home/kub19/.m2/repository/org/apache/hive/hive-storage-api/2.6.0/hive-storage-api-2.6.0.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/home/kub19/.m2/repository/org/scala-lang/scala-library/2.12.8/scala-library-2.12.8.jar:/home/kub19/.m2/repository/org/scala-lang/scala-reflect/2.12.8/scala-reflect-2.12.8.jar
chapter2









Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org
<http://www.backbutton.co.uk>

Re: Programmatic: parquet file corruption error

Posted by Zahid Rahman <za...@gmail.com>.
Thanks Wenchen.  SOLVED! KINDA!

I removed all dependencies from the pom.xml  in my IDE so I wouldn't be
picking up any libraries from maven repository.
I *instead* included the libraries (jar)  from the *spark download* of
*spark-3.0.0-preview2-bin-hadoop2.7*
This way I am using the *same libraries* which are used when running
*spark-submit
scripts*.

I  believe I managed to trace the issue.
I copied  the log4j.properties.template into Intellij's resources
directory in my project.
Obviously renaming it to log4.properties.
So now I am using also *same** log4j.properties* as when running *spark-submit
scipt.*

I noticed the value of *log4j.logger.org.apache.parquet=ERROR* &
*log4j.logger.parquet=ERROR*.
It appears that this parquet corruption warning is an *outstanding bug* and
the *work around* is to quieten the warning.


#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
%p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the
spark-shell, the
# log level for this class is used to overwrite the root logger's log
level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN

# Settings to quiet third party logs that are too verbose
log4j.logger.org.sparkproject.jetty=WARN
log4j.logger.org.sparkproject.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up
nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR




Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org
<http://www.backbutton.co.uk>


On Fri, 27 Mar 2020 at 07:44, Wenchen Fan <cl...@gmail.com> wrote:

> Running Spark application with an IDE is not officially supported. It may
> work under some cases but there is no guarantee at all. The official way is
> to run interactive queries with spark-shell or package your application to
> a jar and use spark-submit.
>
> On Thu, Mar 26, 2020 at 4:12 PM Zahid Rahman <za...@gmail.com> wrote:
>
>> Hi,
>>
>> When I run the code for a user defined data type dataset using case class
>> in scala  and run the code in the interactive spark-shell against parquet
>> file. The results are as expected.
>> However I then the same code programmatically in IntelliJ IDE then spark
>> is give a file corruption error.
>>
>> Steps I have taken to determine the source of error are :
>> I have tested for file permission and made sure to chmod 777 , just in
>> case.
>> I tried a fresh copy of same parquet file.
>> I ran both programme before and after the fresh copy.
>> I also rebooted then ran programmatically against a fresh parquet file.
>> The corruption error was consistent in all cases.
>> I have copy and pasted the spark-shell , the error message and the code
>> in the IDE and the pom.xml, IntelliJ java  classpath command line.
>>
>> Perhaps the code in the libraries are different than the ones  used by
>> spark-shell from that when run programmatically.
>> I don't believe it is an error on my part.
>>
>> <------------------------------------------------------------------------------------------------------
>>
>> 07:28:45 WARN  CorruptStatistics:117 - Ignoring statistics because
>> created_by could not be parsed (see PARQUET-251): parquet-mr (build
>> 32c46643845ea8a705c35d4ec8fc654cc8ff816d)
>> org.apache.parquet.VersionParser$VersionParseException: Could not parse
>> created_by: parquet-mr (build 32c46643845ea8a705c35d4ec8fc654cc8ff816d)
>> using format:
>> (.*?)\s+version\s*(?:([^(]*?)\s*(?:\(\s*build\s*([^)]*?)\s*\))?)?
>> at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
>> at
>> org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:72)
>> at
>> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatisticsInternal(ParquetMetadataConverter.java:435)
>> at
>> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:454)
>> at
>> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:914)
>> at
>> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:885)
>> at
>> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
>> at
>> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
>> at
>> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
>> at
>> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448)
>> at
>> org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:105)
>> at
>> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:131)
>> at
>> org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.buildReaderBase(ParquetPartitionReaderFactory.scala:174)
>> at
>> org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.createVectorizedReader(ParquetPartitionReaderFactory.scala:205)
>> at
>> org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.buildColumnarReader(ParquetPartitionReaderFactory.scala:103)
>> at
>> org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createColumnarReader$1(FilePartitionReaderFactory.scala:38)
>> at
>> org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory$$Lambda$2018/0000000000000000.apply(Unknown
>> Source)
>> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
>> at
>> org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:109)
>> at
>> org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:42)
>> at
>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:62)
>> at
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>> Source)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>> Source)
>> at
>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>> at
>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:726)
>> at
>> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:321)
>> at
>> org.apache.spark.sql.execution.SparkPlan$$Lambda$1879/0000000000000000.apply(Unknown
>> Source)
>> at
>> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
>> at
>> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
>> at org.apache.spark.rdd.RDD$$Lambda$1875/0000000000000000.apply(Unknown
>> Source)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>> at org.apache.spark.scheduler.Task.run(Task.scala:127)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:441)
>> at
>> org.apache.spark.executor.Executor$TaskRunner$$Lambda$855/0000000000000000.apply(Unknown
>> Source)
>> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:444)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:821)
>>
>>
>>
>> <--------------------------------------------------------------------------------------------------------------
>> Welcome to
>>       ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /___/ .__/\_,_/_/ /_/\_\   version 3.0.0-preview2
>>       /_/
>>
>> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala>       val flightDf =
>> spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
>> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
>> ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala> case class flight (DEST_COUNTRY_NAME: String,
>>      |                      ORIGIN_COUNTRY_NAME:String,
>>      |                      count: BigInt)
>> defined class flight
>>
>> scala>      val flightDf =
>> spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
>> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
>> ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala>     val flights = flightDf.as[flight]
>> flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME:
>> string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala>   flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
>> "Canada").map(flight_row => flight_row).take(3)
>> res0: Array[flight] = Array(flight(United States,Romania,1),
>> flight(United States,Ireland,264), flight(United States,India,69))
>>
>> <
>> ---------------------------------------------------------------------------------------------------------------------
>>
>> import org.apache.spark.sql.SparkSession
>>
>> object chapter2 {
>>
>>   // define specific  data type class then manipulate it using the filter and map functions
>>   case class flight (DEST_COUNTRY_NAME: String,
>>                      ORIGIN_COUNTRY_NAME:String,
>>                      count: BigInt)
>>
>>     def main(args: Array[String]): Unit = {
>>
>>       // using an inter active shell, spark session needed here to avoid Intellij errors
>>       val spark = SparkSession.builder.master("local[*]").appName(" chapter2").getOrCreate
>>
>>       import spark.implicits._
>>       val flightDf = spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
>>       val flights = flightDf.as[flight]
>>       flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != "Canada").map(flight_row => flight_row).take(3)
>>       spark.stop()
>>     }
>> }
>>
>>
>> <---------------------------------------------------------------------------------------------------------------------------------
>>
>> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
>> <dependency>
>>     <groupId>org.apache.spark</groupId>
>>     <artifactId>spark-core_2.12</artifactId>
>>     <version>3.0.0-preview2</version>
>> </dependency>
>>
>> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
>> <dependency>
>>     <groupId>org.apache.spark</groupId>
>>     <artifactId>spark-sql_2.12</artifactId>
>>     <version>3.0.0-preview2</version>
>> </dependency>
>>
>> <!---------------------Intellij command line
>> ------------------------------------->
>> /home/kub19/jdk8u242-b08/bin/java
>> -javaagent:/snap/intellij-idea-community/216/lib/idea_rt.jar=42207:/snap/intellij-idea-community/216/bin
>> -Dfile.encoding=UTF-8 -classpath
>> /home/kub19/jdk8u242-b08/jre/lib/charsets.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/cldrdata.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dnsns.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dtfj.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dtfjview.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/jaccess.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/localedata.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/nashorn.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunec.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunjce_provider.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunpkcs11.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/traceformat.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/zipfs.jar:/home/kub19/jdk8u242-b08/jre/lib/jce.jar:/home/kub19/jdk8u242-b08/jre/lib/jsse.jar:/home/kub19/jdk8u242-b08/jre/lib/management-agent.jar:/home/kub19/jdk8u242-b08/jre/lib/resources.jar:/home/kub19/jdk8u242-b08/jre/lib/rt.jar:/home/kub19/spark-2.4.5-bin-hadoop2.7/projects/SparkDefinitiveGuide/target/classes:/home/kub19/.m2/repository/org/apache/spark/spark-core_2.12/3.0.0-preview2/spark-core_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/home/kub19/.m2/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/kub19/.m2/repository/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/home/kub19/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/kub19/.m2/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/home/kub19/.m2/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/home/kub19/.m2/repository/commons-codec/commons-codec/1.9/commons-codec-1.9.jar:/home/kub19/.m2/repository/com/twitter/chill_2.12/0.9.3/chill_2.12-0.9.3.jar:/home/kub19/.m2/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/home/kub19/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/kub19/.m2/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/home/kub19/.m2/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/home/kub19/.m2/repository/org/apache/xbean/xbean-asm7-shaded/4.15/xbean-asm7-shaded-4.15.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-client/2.7.4/hadoop-client-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-common/2.7.4/hadoop-common-2.7.4.jar:/home/kub19/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/kub19/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/kub19/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/kub19/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/kub19/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/kub19/.m2/repository/org/mortbay/jetty/jetty-sslengine/6.1.26/jetty-sslengine-6.1.26.jar:/home/kub19/.m2/repository/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar:/home/kub19/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/kub19/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/kub19/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/kub19/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-auth/2.7.4/hadoop-auth-2.7.4.jar:/home/kub19/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/home/kub19/.m2/repository/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/home/kub19/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/kub19/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/kub19/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/kub19/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/kub19/.m2/repository/org/apache/curator/curator-client/2.7.1/curator-client-2.7.1.jar:/home/kub19/.m2/repository/org/apache/htrace/htrace-core/3.1.0-incubating/htrace-core-3.1.0-incubating.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.7.4/hadoop-hdfs-2.7.4.jar:/home/kub19/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/kub19/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/kub19/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.7.4/hadoop-mapreduce-client-app-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.7.4/hadoop-mapreduce-client-common-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.7.4/hadoop-yarn-client-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.7.4/hadoop-yarn-server-common-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.7.4/hadoop-mapreduce-client-shuffle-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.7.4/hadoop-yarn-api-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.4/hadoop-mapreduce-client-core-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.7.4/hadoop-yarn-common-2.7.4.jar:/home/kub19/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/kub19/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.7.4/hadoop-mapreduce-client-jobclient-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-annotations/2.7.4/hadoop-annotations-2.7.4.jar:/home/kub19/.m2/repository/org/apache/spark/spark-launcher_2.12/3.0.0-preview2/spark-launcher_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-kvstore_2.12/3.0.0-preview2/spark-kvstore_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.10.0/jackson-core-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.10.0/jackson-annotations-2.10.0.jar:/home/kub19/.m2/repository/org/apache/spark/spark-network-common_2.12/3.0.0-preview2/spark-network-common_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-network-shuffle_2.12/3.0.0-preview2/spark-network-shuffle_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-unsafe_2.12/3.0.0-preview2/spark-unsafe_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/home/kub19/.m2/repository/org/apache/curator/curator-recipes/2.7.1/curator-recipes-2.7.1.jar:/home/kub19/.m2/repository/org/apache/curator/curator-framework/2.7.1/curator-framework-2.7.1.jar:/home/kub19/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/kub19/.m2/repository/org/apache/zookeeper/zookeeper/3.4.14/zookeeper-3.4.14.jar:/home/kub19/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/kub19/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/kub19/.m2/repository/org/apache/commons/commons-lang3/3.9/commons-lang3-3.9.jar:/home/kub19/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/kub19/.m2/repository/org/apache/commons/commons-text/1.6/commons-text-1.6.jar:/home/kub19/.m2/repository/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar:/home/kub19/.m2/repository/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar:/home/kub19/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/kub19/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/kub19/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/kub19/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/home/kub19/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/kub19/.m2/repository/org/xerial/snappy/snappy-java/
>> 1.1.7.3/snappy-java-1.1.7.3.jar:/home/kub19/.m2/repository/org/lz4/lz4-java/1.7.0/lz4-java-1.7.0.jar:/home/kub19/.m2/repository/com/github/luben/zstd-jni/1.4.4-3/zstd-jni-1.4.4-3.jar:/home/kub19/.m2/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/home/kub19/.m2/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/home/kub19/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/kub19/.m2/repository/org/scala-lang/modules/scala-xml_2.12/1.2.0/scala-xml_2.12-1.2.0.jar:/home/kub19/.m2/repository/org/scala-lang/scala-library/2.12.10/scala-library-2.12.10.jar:/home/kub19/.m2/repository/org/scala-lang/scala-reflect/2.12.10/scala-reflect-2.12.10.jar:/home/kub19/.m2/repository/org/json4s/json4s-jackson_2.12/3.6.6/json4s-jackson_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-core_2.12/3.6.6/json4s-core_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-ast_2.12/3.6.6/json4s-ast_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-scalap_2.12/3.6.6/json4s-scalap_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-client/2.29.1/jersey-client-2.29.1.jar:/home/kub19/.m2/repository/jakarta/ws/rs/jakarta.ws.rs-api/2.1.6/jakarta.ws.rs-api-2.1.6.jar:/home/kub19/.m2/repository/org/glassfish/hk2/external/jakarta.inject/2.6.1/jakarta.inject-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-common/2.29.1/jersey-common-2.29.1.jar:/home/kub19/.m2/repository/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/home/kub19/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.3/osgi-resource-locator-1.0.3.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-server/2.29.1/jersey-server-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.29.1/jersey-media-jaxb-2.29.1.jar:/home/kub19/.m2/repository/jakarta/validation/jakarta.validation-api/2.0.2/jakarta.validation-api-2.0.2.jar:/home/kub19/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.29.1/jersey-container-servlet-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.29.1/jersey-container-servlet-core-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/inject/jersey-hk2/2.29.1/jersey-hk2-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-locator/2.6.1/hk2-locator-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.6.1/aopalliance-repackaged-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-api/2.6.1/hk2-api-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-utils/2.6.1/hk2-utils-2.6.1.jar:/home/kub19/.m2/repository/org/javassist/javassist/3.22.0-CR2/javassist-3.22.0-CR2.jar:/home/kub19/.m2/repository/io/netty/netty-all/4.1.42.Final/netty-all-4.1.42.Final.jar:/home/kub19/.m2/repository/com/clearspring/analytics/stream/2.9.6/stream-2.9.6.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-core/4.1.1/metrics-core-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-jvm/4.1.1/metrics-jvm-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-json/4.1.1/metrics-json-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-graphite/4.1.1/metrics-graphite-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-jmx/4.1.1/metrics-jmx-4.1.1.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.10.0/jackson-databind-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.12/2.10.0/jackson-module-scala_2.12-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.10.0/jackson-module-paranamer-2.10.0.jar:/home/kub19/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/kub19/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/kub19/.m2/repository/net/razorvine/pyrolite/4.30/pyrolite-4.30.jar:/home/kub19/.m2/repository/net/sf/py4j/py4j/0.10.8.1/py4j-0.10.8.1.jar:/home/kub19/.m2/repository/org/apache/spark/spark-tags_2.12/3.0.0-preview2/spark-tags_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/kub19/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/home/kub19/.m2/repository/org/apache/spark/spark-sql_2.12/3.0.0-preview2/spark-sql_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/com/univocity/univocity-parsers/2.8.3/univocity-parsers-2.8.3.jar:/home/kub19/.m2/repository/org/apache/spark/spark-sketch_2.12/3.0.0-preview2/spark-sketch_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-catalyst_2.12/3.0.0-preview2/spark-catalyst_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.12/1.1.2/scala-parser-combinators_2.12-1.1.2.jar:/home/kub19/.m2/repository/org/codehaus/janino/janino/3.0.15/janino-3.0.15.jar:/home/kub19/.m2/repository/org/codehaus/janino/commons-compiler/3.0.15/commons-compiler-3.0.15.jar:/home/kub19/.m2/repository/org/antlr/antlr4-runtime/4.7.1/antlr4-runtime-4.7.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-vector/0.15.1/arrow-vector-0.15.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-format/0.15.1/arrow-format-0.15.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-memory/0.15.1/arrow-memory-0.15.1.jar:/home/kub19/.m2/repository/com/google/flatbuffers/flatbuffers-java/1.9.0/flatbuffers-java-1.9.0.jar:/home/kub19/.m2/repository/org/apache/orc/orc-core/1.5.8/orc-core-1.5.8.jar:/home/kub19/.m2/repository/org/apache/orc/orc-shims/1.5.8/orc-shims-1.5.8.jar:/home/kub19/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/kub19/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/kub19/.m2/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/home/kub19/.m2/repository/org/apache/orc/orc-mapreduce/1.5.8/orc-mapreduce-1.5.8.jar:/home/kub19/.m2/repository/org/apache/hive/hive-storage-api/2.6.0/hive-storage-api-2.6.0.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/home/kub19/.m2/repository/org/scala-lang/scala-library/2.12.8/scala-library-2.12.8.jar:/home/kub19/.m2/repository/org/scala-lang/scala-reflect/2.12.8/scala-reflect-2.12.8.jar
>> chapter2
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Backbutton.co.uk
>> ¯\_(ツ)_/¯
>> ♡۶Java♡۶RMI ♡۶
>> Make Use Method {MUM}
>> makeuse.org
>> <http://www.backbutton.co.uk>
>>
>

Re: Programmatic: parquet file corruption error

Posted by Wenchen Fan <cl...@gmail.com>.
Running Spark application with an IDE is not officially supported. It may
work under some cases but there is no guarantee at all. The official way is
to run interactive queries with spark-shell or package your application to
a jar and use spark-submit.

On Thu, Mar 26, 2020 at 4:12 PM Zahid Rahman <za...@gmail.com> wrote:

> Hi,
>
> When I run the code for a user defined data type dataset using case class
> in scala  and run the code in the interactive spark-shell against parquet
> file. The results are as expected.
> However I then the same code programmatically in IntelliJ IDE then spark
> is give a file corruption error.
>
> Steps I have taken to determine the source of error are :
> I have tested for file permission and made sure to chmod 777 , just in
> case.
> I tried a fresh copy of same parquet file.
> I ran both programme before and after the fresh copy.
> I also rebooted then ran programmatically against a fresh parquet file.
> The corruption error was consistent in all cases.
> I have copy and pasted the spark-shell , the error message and the code in
> the IDE and the pom.xml, IntelliJ java  classpath command line.
>
> Perhaps the code in the libraries are different than the ones  used by
> spark-shell from that when run programmatically.
> I don't believe it is an error on my part.
>
> <------------------------------------------------------------------------------------------------------
>
> 07:28:45 WARN  CorruptStatistics:117 - Ignoring statistics because
> created_by could not be parsed (see PARQUET-251): parquet-mr (build
> 32c46643845ea8a705c35d4ec8fc654cc8ff816d)
> org.apache.parquet.VersionParser$VersionParseException: Could not parse
> created_by: parquet-mr (build 32c46643845ea8a705c35d4ec8fc654cc8ff816d)
> using format:
> (.*?)\s+version\s*(?:([^(]*?)\s*(?:\(\s*build\s*([^)]*?)\s*\))?)?
> at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
> at
> org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:72)
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatisticsInternal(ParquetMetadataConverter.java:435)
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:454)
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:914)
> at
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:885)
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448)
> at
> org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:105)
> at
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:131)
> at
> org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.buildReaderBase(ParquetPartitionReaderFactory.scala:174)
> at
> org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.createVectorizedReader(ParquetPartitionReaderFactory.scala:205)
> at
> org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory.buildColumnarReader(ParquetPartitionReaderFactory.scala:103)
> at
> org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createColumnarReader$1(FilePartitionReaderFactory.scala:38)
> at
> org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory$$Lambda$2018/0000000000000000.apply(Unknown
> Source)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
> at
> org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:109)
> at
> org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:42)
> at
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:62)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:726)
> at
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:321)
> at
> org.apache.spark.sql.execution.SparkPlan$$Lambda$1879/0000000000000000.apply(Unknown
> Source)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
> at
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
> at org.apache.spark.rdd.RDD$$Lambda$1875/0000000000000000.apply(Unknown
> Source)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:127)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:441)
> at
> org.apache.spark.executor.Executor$TaskRunner$$Lambda$855/0000000000000000.apply(Unknown
> Source)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:444)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:821)
>
>
>
> <--------------------------------------------------------------------------------------------------------------
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.0.0-preview2
>       /_/
>
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala>       val flightDf =
> spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
> ORIGIN_COUNTRY_NAME: string ... 1 more field]
>
> scala> case class flight (DEST_COUNTRY_NAME: String,
>      |                      ORIGIN_COUNTRY_NAME:String,
>      |                      count: BigInt)
> defined class flight
>
> scala>      val flightDf =
> spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
> ORIGIN_COUNTRY_NAME: string ... 1 more field]
>
> scala>     val flights = flightDf.as[flight]
> flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME:
> string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
>
> scala>   flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
> "Canada").map(flight_row => flight_row).take(3)
> res0: Array[flight] = Array(flight(United States,Romania,1), flight(United
> States,Ireland,264), flight(United States,India,69))
>
> <
> ---------------------------------------------------------------------------------------------------------------------
>
> import org.apache.spark.sql.SparkSession
>
> object chapter2 {
>
>   // define specific  data type class then manipulate it using the filter and map functions
>   case class flight (DEST_COUNTRY_NAME: String,
>                      ORIGIN_COUNTRY_NAME:String,
>                      count: BigInt)
>
>     def main(args: Array[String]): Unit = {
>
>       // using an inter active shell, spark session needed here to avoid Intellij errors
>       val spark = SparkSession.builder.master("local[*]").appName(" chapter2").getOrCreate
>
>       import spark.implicits._
>       val flightDf = spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
>       val flights = flightDf.as[flight]
>       flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != "Canada").map(flight_row => flight_row).take(3)
>       spark.stop()
>     }
> }
>
>
> <---------------------------------------------------------------------------------------------------------------------------------
>
> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
> <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-core_2.12</artifactId>
>     <version>3.0.0-preview2</version>
> </dependency>
>
> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
> <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-sql_2.12</artifactId>
>     <version>3.0.0-preview2</version>
> </dependency>
>
> <!---------------------Intellij command line
> ------------------------------------->
> /home/kub19/jdk8u242-b08/bin/java
> -javaagent:/snap/intellij-idea-community/216/lib/idea_rt.jar=42207:/snap/intellij-idea-community/216/bin
> -Dfile.encoding=UTF-8 -classpath
> /home/kub19/jdk8u242-b08/jre/lib/charsets.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/cldrdata.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dnsns.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dtfj.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/dtfjview.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/jaccess.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/localedata.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/nashorn.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunec.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunjce_provider.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/sunpkcs11.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/traceformat.jar:/home/kub19/jdk8u242-b08/jre/lib/ext/zipfs.jar:/home/kub19/jdk8u242-b08/jre/lib/jce.jar:/home/kub19/jdk8u242-b08/jre/lib/jsse.jar:/home/kub19/jdk8u242-b08/jre/lib/management-agent.jar:/home/kub19/jdk8u242-b08/jre/lib/resources.jar:/home/kub19/jdk8u242-b08/jre/lib/rt.jar:/home/kub19/spark-2.4.5-bin-hadoop2.7/projects/SparkDefinitiveGuide/target/classes:/home/kub19/.m2/repository/org/apache/spark/spark-core_2.12/3.0.0-preview2/spark-core_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/home/kub19/.m2/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/kub19/.m2/repository/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/home/kub19/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/kub19/.m2/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/home/kub19/.m2/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/home/kub19/.m2/repository/commons-codec/commons-codec/1.9/commons-codec-1.9.jar:/home/kub19/.m2/repository/com/twitter/chill_2.12/0.9.3/chill_2.12-0.9.3.jar:/home/kub19/.m2/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/home/kub19/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/kub19/.m2/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/home/kub19/.m2/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/home/kub19/.m2/repository/org/apache/xbean/xbean-asm7-shaded/4.15/xbean-asm7-shaded-4.15.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-client/2.7.4/hadoop-client-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-common/2.7.4/hadoop-common-2.7.4.jar:/home/kub19/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/kub19/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/kub19/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/kub19/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/kub19/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/kub19/.m2/repository/org/mortbay/jetty/jetty-sslengine/6.1.26/jetty-sslengine-6.1.26.jar:/home/kub19/.m2/repository/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar:/home/kub19/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/kub19/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/kub19/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/kub19/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-auth/2.7.4/hadoop-auth-2.7.4.jar:/home/kub19/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/home/kub19/.m2/repository/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/home/kub19/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/kub19/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/kub19/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/kub19/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/kub19/.m2/repository/org/apache/curator/curator-client/2.7.1/curator-client-2.7.1.jar:/home/kub19/.m2/repository/org/apache/htrace/htrace-core/3.1.0-incubating/htrace-core-3.1.0-incubating.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.7.4/hadoop-hdfs-2.7.4.jar:/home/kub19/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/kub19/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/kub19/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.7.4/hadoop-mapreduce-client-app-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.7.4/hadoop-mapreduce-client-common-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.7.4/hadoop-yarn-client-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.7.4/hadoop-yarn-server-common-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.7.4/hadoop-mapreduce-client-shuffle-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.7.4/hadoop-yarn-api-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.4/hadoop-mapreduce-client-core-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.7.4/hadoop-yarn-common-2.7.4.jar:/home/kub19/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/kub19/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/kub19/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.7.4/hadoop-mapreduce-client-jobclient-2.7.4.jar:/home/kub19/.m2/repository/org/apache/hadoop/hadoop-annotations/2.7.4/hadoop-annotations-2.7.4.jar:/home/kub19/.m2/repository/org/apache/spark/spark-launcher_2.12/3.0.0-preview2/spark-launcher_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-kvstore_2.12/3.0.0-preview2/spark-kvstore_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.10.0/jackson-core-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.10.0/jackson-annotations-2.10.0.jar:/home/kub19/.m2/repository/org/apache/spark/spark-network-common_2.12/3.0.0-preview2/spark-network-common_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-network-shuffle_2.12/3.0.0-preview2/spark-network-shuffle_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-unsafe_2.12/3.0.0-preview2/spark-unsafe_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/home/kub19/.m2/repository/org/apache/curator/curator-recipes/2.7.1/curator-recipes-2.7.1.jar:/home/kub19/.m2/repository/org/apache/curator/curator-framework/2.7.1/curator-framework-2.7.1.jar:/home/kub19/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/kub19/.m2/repository/org/apache/zookeeper/zookeeper/3.4.14/zookeeper-3.4.14.jar:/home/kub19/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/kub19/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/kub19/.m2/repository/org/apache/commons/commons-lang3/3.9/commons-lang3-3.9.jar:/home/kub19/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/kub19/.m2/repository/org/apache/commons/commons-text/1.6/commons-text-1.6.jar:/home/kub19/.m2/repository/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar:/home/kub19/.m2/repository/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar:/home/kub19/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/kub19/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/kub19/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/kub19/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/home/kub19/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/kub19/.m2/repository/org/xerial/snappy/snappy-java/
> 1.1.7.3/snappy-java-1.1.7.3.jar:/home/kub19/.m2/repository/org/lz4/lz4-java/1.7.0/lz4-java-1.7.0.jar:/home/kub19/.m2/repository/com/github/luben/zstd-jni/1.4.4-3/zstd-jni-1.4.4-3.jar:/home/kub19/.m2/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/home/kub19/.m2/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/home/kub19/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/kub19/.m2/repository/org/scala-lang/modules/scala-xml_2.12/1.2.0/scala-xml_2.12-1.2.0.jar:/home/kub19/.m2/repository/org/scala-lang/scala-library/2.12.10/scala-library-2.12.10.jar:/home/kub19/.m2/repository/org/scala-lang/scala-reflect/2.12.10/scala-reflect-2.12.10.jar:/home/kub19/.m2/repository/org/json4s/json4s-jackson_2.12/3.6.6/json4s-jackson_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-core_2.12/3.6.6/json4s-core_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-ast_2.12/3.6.6/json4s-ast_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/json4s/json4s-scalap_2.12/3.6.6/json4s-scalap_2.12-3.6.6.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-client/2.29.1/jersey-client-2.29.1.jar:/home/kub19/.m2/repository/jakarta/ws/rs/jakarta.ws.rs-api/2.1.6/jakarta.ws.rs-api-2.1.6.jar:/home/kub19/.m2/repository/org/glassfish/hk2/external/jakarta.inject/2.6.1/jakarta.inject-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-common/2.29.1/jersey-common-2.29.1.jar:/home/kub19/.m2/repository/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/home/kub19/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.3/osgi-resource-locator-1.0.3.jar:/home/kub19/.m2/repository/org/glassfish/jersey/core/jersey-server/2.29.1/jersey-server-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.29.1/jersey-media-jaxb-2.29.1.jar:/home/kub19/.m2/repository/jakarta/validation/jakarta.validation-api/2.0.2/jakarta.validation-api-2.0.2.jar:/home/kub19/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.29.1/jersey-container-servlet-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.29.1/jersey-container-servlet-core-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/jersey/inject/jersey-hk2/2.29.1/jersey-hk2-2.29.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-locator/2.6.1/hk2-locator-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.6.1/aopalliance-repackaged-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-api/2.6.1/hk2-api-2.6.1.jar:/home/kub19/.m2/repository/org/glassfish/hk2/hk2-utils/2.6.1/hk2-utils-2.6.1.jar:/home/kub19/.m2/repository/org/javassist/javassist/3.22.0-CR2/javassist-3.22.0-CR2.jar:/home/kub19/.m2/repository/io/netty/netty-all/4.1.42.Final/netty-all-4.1.42.Final.jar:/home/kub19/.m2/repository/com/clearspring/analytics/stream/2.9.6/stream-2.9.6.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-core/4.1.1/metrics-core-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-jvm/4.1.1/metrics-jvm-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-json/4.1.1/metrics-json-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-graphite/4.1.1/metrics-graphite-4.1.1.jar:/home/kub19/.m2/repository/io/dropwizard/metrics/metrics-jmx/4.1.1/metrics-jmx-4.1.1.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.10.0/jackson-databind-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.12/2.10.0/jackson-module-scala_2.12-2.10.0.jar:/home/kub19/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.10.0/jackson-module-paranamer-2.10.0.jar:/home/kub19/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/kub19/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/kub19/.m2/repository/net/razorvine/pyrolite/4.30/pyrolite-4.30.jar:/home/kub19/.m2/repository/net/sf/py4j/py4j/0.10.8.1/py4j-0.10.8.1.jar:/home/kub19/.m2/repository/org/apache/spark/spark-tags_2.12/3.0.0-preview2/spark-tags_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/kub19/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/home/kub19/.m2/repository/org/apache/spark/spark-sql_2.12/3.0.0-preview2/spark-sql_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/com/univocity/univocity-parsers/2.8.3/univocity-parsers-2.8.3.jar:/home/kub19/.m2/repository/org/apache/spark/spark-sketch_2.12/3.0.0-preview2/spark-sketch_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/apache/spark/spark-catalyst_2.12/3.0.0-preview2/spark-catalyst_2.12-3.0.0-preview2.jar:/home/kub19/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.12/1.1.2/scala-parser-combinators_2.12-1.1.2.jar:/home/kub19/.m2/repository/org/codehaus/janino/janino/3.0.15/janino-3.0.15.jar:/home/kub19/.m2/repository/org/codehaus/janino/commons-compiler/3.0.15/commons-compiler-3.0.15.jar:/home/kub19/.m2/repository/org/antlr/antlr4-runtime/4.7.1/antlr4-runtime-4.7.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-vector/0.15.1/arrow-vector-0.15.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-format/0.15.1/arrow-format-0.15.1.jar:/home/kub19/.m2/repository/org/apache/arrow/arrow-memory/0.15.1/arrow-memory-0.15.1.jar:/home/kub19/.m2/repository/com/google/flatbuffers/flatbuffers-java/1.9.0/flatbuffers-java-1.9.0.jar:/home/kub19/.m2/repository/org/apache/orc/orc-core/1.5.8/orc-core-1.5.8.jar:/home/kub19/.m2/repository/org/apache/orc/orc-shims/1.5.8/orc-shims-1.5.8.jar:/home/kub19/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/kub19/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/kub19/.m2/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/home/kub19/.m2/repository/org/apache/orc/orc-mapreduce/1.5.8/orc-mapreduce-1.5.8.jar:/home/kub19/.m2/repository/org/apache/hive/hive-storage-api/2.6.0/hive-storage-api-2.6.0.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/home/kub19/.m2/repository/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/home/kub19/.m2/repository/org/scala-lang/scala-library/2.12.8/scala-library-2.12.8.jar:/home/kub19/.m2/repository/org/scala-lang/scala-reflect/2.12.8/scala-reflect-2.12.8.jar
> chapter2
>
>
>
>
>
>
>
>
>
> Backbutton.co.uk
> ¯\_(ツ)_/¯
> ♡۶Java♡۶RMI ♡۶
> Make Use Method {MUM}
> makeuse.org
> <http://www.backbutton.co.uk>
>