You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2020/03/26 19:18:08 UTC

Re: results of taken(3) not appearing in console window

bcc dev, +user

You need to print out the result. Take itself doesn't print. You only got the results printed to the console because the Scala REPL automatically prints the returned value from take.

On Thu, Mar 26, 2020 at 12:15 PM, Zahid Rahman < zahidr1000@gmail.com > wrote:

> 
> I am running the same code with the same libraries but not getting same
> output.
> scala>  case class flight (DEST_COUNTRY_NAME: String,
>      |                      ORIGIN_COUNTRY_NAME:String,
>      |                      count: BigInt)
> defined class flight
> 
> scala>     val flightDf = spark. read. parquet ( http://spark.read.parquet/
> ) ("/data/flight-data/parquet/2010-summary.parquet/")
> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
> ORIGIN_COUNTRY_NAME: string ... 1 more field]
> 
> scala> val flights = flightDf. as ( http://flightdf.as/ ) [flight]
> flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME:
> string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
> 
> scala> flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
> "Canada").map(flight_row => flight_row).take(3)
> *res0: Array[flight] = Array(flight(United States,Romania,1),
> flight(United States,Ireland,264), flight(United States,India,69))
> *
> 
> 
> <!------------------------------------------------------------------------------------------------------------------------------
> 
> 20/03/26 19:09:00 INFO SparkContext: Running Spark version 3.0.0-preview2
> 20/03/26 19:09:00 INFO ResourceUtils:
> ==============================================================
> 20/03/26 19:09:00 INFO ResourceUtils: Resources for spark.driver:
> 
> 20/03/26 19:09:00 INFO ResourceUtils:
> ==============================================================
> 20/03/26 19:09:00 INFO SparkContext: Submitted application:  chapter2
> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls to: kub19
> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls to: kub19
> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls groups to:
> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls groups to:
> 20/03/26 19:09:00 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(kub19);
> groups with view permissions: Set(); users  with modify permissions:
> Set(kub19); groups with modify permissions: Set()
> 20/03/26 19:09:00 INFO Utils: Successfully started service 'sparkDriver'
> on port 46817.
> 20/03/26 19:09:00 INFO SparkEnv: Registering MapOutputTracker
> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMaster
> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint: Using org. apache. spark.
> storage. DefaultTopologyMapper (
> http://org.apache.spark.storage.defaulttopologymapper/ ) for getting
> topology information
> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint:
> BlockManagerMasterEndpoint up
> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> 20/03/26 19:09:00 INFO DiskBlockManager: Created local directory at
> /tmp/blockmgr-0baf5097-2595-4542-99e2-a192d7baf37c
> 20/03/26 19:09:00 INFO MemoryStore: MemoryStore started with capacity
> 127.2 MiB
> 20/03/26 19:09:01 INFO SparkEnv: Registering OutputCommitCoordinator
> 20/03/26 19:09:01 WARN Utils: Service 'SparkUI' could not bind on port
> 4040. Attempting port 4041.
> 20/03/26 19:09:01 INFO Utils: Successfully started service 'SparkUI' on
> port 4041.
> 20/03/26 19:09:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http:/
> / localhost:4041 ( http://localhost:4041 )
> 20/03/26 19:09:01 INFO Executor: Starting executor ID driver on host
> localhost
> 20/03/26 19:09:01 INFO Utils: Successfully started service ' org. apache. spark.
> network. netty. NettyBlockTransferService (
> http://org.apache.spark.network.netty.nettyblocktransferservice/ ) ' on
> port 38135.
> 20/03/26 19:09:01 INFO NettyBlockTransferService: Server created on
> localhost:38135
> 20/03/26 19:09:01 INFO BlockManager: Using org. apache. spark. storage. RandomBlockReplicationPolicy
> ( http://org.apache.spark.storage.randomblockreplicationpolicy/ ) for block
> replication policy
> 20/03/26 19:09:01 INFO BlockManagerMaster: Registering BlockManager
> BlockManagerId(driver, localhost, 38135, None)
> 20/03/26 19:09:01 INFO BlockManagerMasterEndpoint: Registering block
> manager localhost:38135 with 127.2 MiB RAM, BlockManagerId(driver,
> localhost, 38135, None)
> 20/03/26 19:09:01 INFO BlockManagerMaster: Registered BlockManager
> BlockManagerId(driver, localhost, 38135, None)
> 20/03/26 19:09:01 INFO BlockManager: Initialized BlockManager:
> BlockManagerId(driver, localhost, 38135, None)
> 20/03/26 19:09:01 INFO SharedState: Setting hive.metastore.warehouse.dir
> ('null') to the value of spark.sql.warehouse.dir
> ('file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse').
> 
> 20/03/26 19:09:01 INFO SharedState: Warehouse path is
> 'file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse'.
> 
> 20/03/26 19:09:02 INFO SparkContext: Starting job: parquet at
> chapter2.scala:18
> 20/03/26 19:09:02 INFO DAGScheduler: Got job 0 (parquet at
> chapter2.scala:18) with 1 output partitions
> 20/03/26 19:09:02 INFO DAGScheduler: Final stage: ResultStage 0 (parquet
> at chapter2.scala:18)
> 20/03/26 19:09:02 INFO DAGScheduler: Parents of final stage: List()
> 20/03/26 19:09:02 INFO DAGScheduler: Missing parents: List()
> 20/03/26 19:09:02 INFO DAGScheduler: Submitting ResultStage 0
> (MapPartitionsRDD[1] at parquet at chapter2.scala:18), which has no
> missing parents
> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0 stored as values in
> memory (estimated size 72.8 KiB, free 127.1 MiB)
> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0_piece0 stored as
> bytes in memory (estimated size 25.9 KiB, free 127.1 MiB)
> 20/03/26 19:09:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in
> memory on localhost:38135 (size: 25.9 KiB, free: 127.2 MiB)
> 20/03/26 19:09:02 INFO SparkContext: Created broadcast 0 from broadcast at
> DAGScheduler.scala:1206
> 20/03/26 19:09:02 INFO DAGScheduler: Submitting 1 missing tasks from
> ResultStage 0 (MapPartitionsRDD[1] at parquet at chapter2.scala:18) (first
> 15 tasks are for partitions Vector(0))
> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
> 
> 20/03/26 19:09:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
> 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7560 bytes)
> 20/03/26 19:09:02 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 20/03/26 19:09:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
> 1840 bytes result sent to driver
> 20/03/26 19:09:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
> 0) in 204 ms on localhost (executor driver) (1/1)
> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
> have all completed, from pool
> 20/03/26 19:09:02 INFO DAGScheduler: ResultStage 0 (parquet at
> chapter2.scala:18) finished in 0.304 s
> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 is finished. Cancelling
> potential speculative or zombie tasks for this job
> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Killing all running tasks in
> stage 0: Stage finished
> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 finished: parquet at
> chapter2.scala:18, took 0.332643 s
> 20/03/26 19:09:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
> localhost:38135 in memory (size: 25.9 KiB, free: 127.2 MiB)
> 20/03/26 19:09:04 INFO V2ScanRelationPushDown:
> Pushing operators to parquet
> file:/data/flight-data/parquet/2010-summary.parquet
> Pushed Filters:
> Post-Scan Filters:
> Output: DEST_COUNTRY_NAME#0, ORIGIN_COUNTRY_NAME#1, count#2L
>          
> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1 stored as values in
> memory (estimated size 290.0 KiB, free 126.9 MiB)
> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1_piece0 stored as
> bytes in memory (estimated size 24.3 KiB, free 126.9 MiB)
> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in
> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 1 from take at
> chapter2.scala:20
> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2 stored as values in
> memory (estimated size 290.1 KiB, free 126.6 MiB)
> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2_piece0 stored as
> bytes in memory (estimated size 24.3 KiB, free 126.6 MiB)
> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in
> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 2 from take at
> chapter2.scala:20
> 20/03/26 19:09:04 INFO CodeGenerator: Code generated in 159.155401 ms
> 20/03/26 19:09:04 INFO SparkContext: Starting job: take at
> chapter2.scala:20
> 20/03/26 19:09:04 INFO DAGScheduler: Got job 1 (take at chapter2.scala:20)
> with 1 output partitions
> 20/03/26 19:09:04 INFO DAGScheduler: Final stage: ResultStage 1 (take at
> chapter2.scala:20)
> 20/03/26 19:09:04 INFO DAGScheduler: Parents of final stage: List()
> 20/03/26 19:09:04 INFO DAGScheduler: Missing parents: List()
> 20/03/26 19:09:04 INFO DAGScheduler: Submitting ResultStage 1
> (MapPartitionsRDD[5] at take at chapter2.scala:20), which has no missing
> parents
> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3 stored as values in
> memory (estimated size 22.7 KiB, free 126.6 MiB)
> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3_piece0 stored as
> bytes in memory (estimated size 8.1 KiB, free 126.6 MiB)
> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_3_piece0 in
> memory on localhost:38135 (size: 8.1 KiB, free: 127.1 MiB)
> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 3 from broadcast at
> DAGScheduler.scala:1206
> 20/03/26 19:09:04 INFO DAGScheduler: Submitting 1 missing tasks from
> ResultStage 1 (MapPartitionsRDD[5] at take at chapter2.scala:20) (first 15
> tasks are for partitions Vector(0))
> 20/03/26 19:09:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
> 
> 20/03/26 19:09:04 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
> 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 7980 bytes)
> 20/03/26 19:09:04 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
> 20/03/26 19:09:05 INFO FilePartitionReader: Reading file path:
> file:///data/flight-data/parquet/2010-summary.parquet/part-r-00000-1a9822ba-b8fb-4d8e-844a-ea30d0801b9e.gz.parquet,
> range: 0-3921, partition values: [empty row]
> 20/03/26 19:09:05 INFO ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 20/03/26 19:09:05 INFO CodecPool: Got brand-new decompressor [.gz]
> 20/03/26 19:09:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1).
> 1762 bytes result sent to driver
> 20/03/26 19:09:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
> 1) in 219 ms on localhost (executor driver) (1/1)
> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
> have all completed, from pool
> 20/03/26 19:09:05 INFO DAGScheduler: ResultStage 1 (take at
> chapter2.scala:20) finished in 0.235 s
> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 is finished. Cancelling
> potential speculative or zombie tasks for this job
> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Killing all running tasks in
> stage 1: Stage finished
> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 finished: take at
> chapter2.scala:20, took 0.238010 s
> 20/03/26 19:09:05 INFO CodeGenerator: Code generated in 17.77886 ms
> 20/03/26 19:09:05 INFO SparkUI: Stopped Spark web UI at http:/ / localhost:4041
> ( http://localhost:4041 )
> 20/03/26 19:09:05 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 20/03/26 19:09:05 INFO MemoryStore: MemoryStore cleared
> 20/03/26 19:09:05 INFO BlockManager: BlockManager stopped
> 20/03/26 19:09:05 INFO BlockManagerMaster: BlockManagerMaster stopped
> 20/03/26 19:09:05 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 20/03/26 19:09:05 INFO SparkContext: Successfully stopped SparkContext
> 20/03/26 19:09:05 INFO ShutdownHookManager: Shutdown hook called
> 20/03/26 19:09:05 INFO ShutdownHookManager: Deleting directory
> /tmp/spark-6d99677e-ae1b-4894-aa32-3a79fb0b4307
> 
> Process finished with exit code 0
> <!----------------------------------------------------------------------------------------------------------
> 
> import org.apache.spark.sql.SparkSession
> 
> object chapter2 {
> 
> // define specific data type class then manipulate it using the filter and
> map functions
> // this is also known as an Encoder
> case class flight (DEST_COUNTRY_NAME: String ,
> ORIGIN_COUNTRY_NAME: String ,
> count: BigInt )
> 
> def main (args: Array[ String ]): Unit = {
> 
> // using an inter active shell, spark session needed here to avoid
> Intellij errors
> val spark = SparkSession. builder.master( "local[*]" ).appName( " chapter2"
> ).getOrCreate
> 
> // looks like a hard coded system work around
> import spark.implicits._
> val flightDf = spark. read. parquet ( http://spark.read.parquet/ ) ( "/data/flight-data/parquet/2010-summary.parquet/"
> )
> val flights = flightDf. as ( http://flightdf.as/ ) [flight]
> flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != "Canada" ).map(flight_row
> => flight_row).take( 3 )
> 
> spark.stop()
> }
> }
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Backbutton. co. uk ( http://Backbutton.co.uk )
> ¯\_(ツ)_/¯
> ♡۶Java♡۶RMI ♡۶
> Make Use Method {MUM}
> makeuse. org ( http://makeuse.org )
>

Re: results of taken(3) not appearing in console window

Posted by Zahid Rahman <za...@gmail.com>.
Thanks.  I added that as well.
I also needed to add  a hard coded import previously
spark.implicits._
 val flightDf =
spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")

flights.show(3)

+-----------------+-------------------+-----+
|DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|count|
+-----------------+-------------------+-----+
|    United States|            Romania|    1|
|    United States|            Ireland|  264|
|    United States|              India|   69|
+-----------------+-------------------+-----+
only showing top 3 rows


Process finished with exit code 0

Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org
<http://www.backbutton.co.uk>


On Thu, 26 Mar 2020 at 19:18, Reynold Xin <rx...@databricks.com> wrote:

> bcc dev, +user
>
> You need to print out the result. Take itself doesn't print. You only got
> the results printed to the console because the Scala REPL automatically
> prints the returned value from take.
>
>
> On Thu, Mar 26, 2020 at 12:15 PM, Zahid Rahman <za...@gmail.com>
> wrote:
>
>> I am running the same code with the same libraries but not getting same
>> output.
>> scala>  case class flight (DEST_COUNTRY_NAME: String,
>>      |                      ORIGIN_COUNTRY_NAME:String,
>>      |                      count: BigInt)
>> defined class flight
>>
>> scala>     val flightDf = spark.read.parquet
>> ("/data/flight-data/parquet/2010-summary.parquet/")
>> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
>> ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala> val flights = flightDf.as <http://flightdf.as/>[flight]
>> flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME:
>> string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala> flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
>> "Canada").map(flight_row => flight_row).take(3)
>>
>> *res0: Array[flight] = Array(flight(United States,Romania,1),
>> flight(United States,Ireland,264), flight(United States,India,69))*
>>
>>
>> <!------------------------------------------------------------------------------------------------------------------------------
>> 20/03/26 19:09:00 INFO SparkContext: Running Spark version 3.0.0-preview2
>> 20/03/26 19:09:00 INFO ResourceUtils:
>> ==============================================================
>> 20/03/26 19:09:00 INFO ResourceUtils: Resources for spark.driver:
>>
>> 20/03/26 19:09:00 INFO ResourceUtils:
>> ==============================================================
>> 20/03/26 19:09:00 INFO SparkContext: Submitted application:  chapter2
>> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls to: kub19
>> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls to: kub19
>> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls groups to:
>> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls groups to:
>> 20/03/26 19:09:00 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users  with view permissions: Set(kub19);
>> groups with view permissions: Set(); users  with modify permissions:
>> Set(kub19); groups with modify permissions: Set()
>> 20/03/26 19:09:00 INFO Utils: Successfully started service 'sparkDriver'
>> on port 46817.
>> 20/03/26 19:09:00 INFO SparkEnv: Registering MapOutputTracker
>> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMaster
>> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint: Using
>> org.apache.spark.storage.DefaultTopologyMapper
>> <http://org.apache.spark.storage.defaulttopologymapper/> for getting
>> topology information
>> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint:
>> BlockManagerMasterEndpoint up
>> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>> 20/03/26 19:09:00 INFO DiskBlockManager: Created local directory at
>> /tmp/blockmgr-0baf5097-2595-4542-99e2-a192d7baf37c
>> 20/03/26 19:09:00 INFO MemoryStore: MemoryStore started with capacity
>> 127.2 MiB
>> 20/03/26 19:09:01 INFO SparkEnv: Registering OutputCommitCoordinator
>> 20/03/26 19:09:01 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> 20/03/26 19:09:01 INFO Utils: Successfully started service 'SparkUI' on
>> port 4041.
>> 20/03/26 19:09:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
>> http://localhost:4041
>> 20/03/26 19:09:01 INFO Executor: Starting executor ID driver on host
>> localhost
>> 20/03/26 19:09:01 INFO Utils: Successfully started service '
>> org.apache.spark.network.netty.NettyBlockTransferService
>> <http://org.apache.spark.network.netty.nettyblocktransferservice/>' on
>> port 38135.
>> 20/03/26 19:09:01 INFO NettyBlockTransferService: Server created on
>> localhost:38135
>> 20/03/26 19:09:01 INFO BlockManager: Using
>> org.apache.spark.storage.RandomBlockReplicationPolicy
>> <http://org.apache.spark.storage.randomblockreplicationpolicy/> for
>> block replication policy
>> 20/03/26 19:09:01 INFO BlockManagerMaster: Registering BlockManager
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManagerMasterEndpoint: Registering block
>> manager localhost:38135 with 127.2 MiB RAM, BlockManagerId(driver,
>> localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManagerMaster: Registered BlockManager
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManager: Initialized BlockManager:
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO SharedState: Setting hive.metastore.warehouse.dir
>> ('null') to the value of spark.sql.warehouse.dir
>> ('file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse').
>> 20/03/26 19:09:01 INFO SharedState: Warehouse path is
>> 'file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse'.
>> 20/03/26 19:09:02 INFO SparkContext: Starting job: parquet at
>> chapter2.scala:18
>> 20/03/26 19:09:02 INFO DAGScheduler: Got job 0 (parquet at
>> chapter2.scala:18) with 1 output partitions
>> 20/03/26 19:09:02 INFO DAGScheduler: Final stage: ResultStage 0 (parquet
>> at chapter2.scala:18)
>> 20/03/26 19:09:02 INFO DAGScheduler: Parents of final stage: List()
>> 20/03/26 19:09:02 INFO DAGScheduler: Missing parents: List()
>> 20/03/26 19:09:02 INFO DAGScheduler: Submitting ResultStage 0
>> (MapPartitionsRDD[1] at parquet at chapter2.scala:18), which has no missing
>> parents
>> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0 stored as values in
>> memory (estimated size 72.8 KiB, free 127.1 MiB)
>> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0_piece0 stored as
>> bytes in memory (estimated size 25.9 KiB, free 127.1 MiB)
>> 20/03/26 19:09:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>> memory on localhost:38135 (size: 25.9 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:02 INFO SparkContext: Created broadcast 0 from broadcast
>> at DAGScheduler.scala:1206
>> 20/03/26 19:09:02 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 0 (MapPartitionsRDD[1] at parquet at chapter2.scala:18) (first
>> 15 tasks are for partitions Vector(0))
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
>> 20/03/26 19:09:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0
>> (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7560 bytes)
>> 20/03/26 19:09:02 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>> 20/03/26 19:09:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
>> 1840 bytes result sent to driver
>> 20/03/26 19:09:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0
>> (TID 0) in 204 ms on localhost (executor driver) (1/1)
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose
>> tasks have all completed, from pool
>> 20/03/26 19:09:02 INFO DAGScheduler: ResultStage 0 (parquet at
>> chapter2.scala:18) finished in 0.304 s
>> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 is finished. Cancelling
>> potential speculative or zombie tasks for this job
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Killing all running tasks in
>> stage 0: Stage finished
>> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 finished: parquet at
>> chapter2.scala:18, took 0.332643 s
>> 20/03/26 19:09:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
>> localhost:38135 in memory (size: 25.9 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO V2ScanRelationPushDown:
>> Pushing operators to parquet
>> file:/data/flight-data/parquet/2010-summary.parquet
>> Pushed Filters:
>> Post-Scan Filters:
>> Output: DEST_COUNTRY_NAME#0, ORIGIN_COUNTRY_NAME#1, count#2L
>>
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1 stored as values in
>> memory (estimated size 290.0 KiB, free 126.9 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1_piece0 stored as
>> bytes in memory (estimated size 24.3 KiB, free 126.9 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in
>> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 1 from take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2 stored as values in
>> memory (estimated size 290.1 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2_piece0 stored as
>> bytes in memory (estimated size 24.3 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in
>> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 2 from take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO CodeGenerator: Code generated in 159.155401 ms
>> 20/03/26 19:09:04 INFO SparkContext: Starting job: take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO DAGScheduler: Got job 1 (take at
>> chapter2.scala:20) with 1 output partitions
>> 20/03/26 19:09:04 INFO DAGScheduler: Final stage: ResultStage 1 (take at
>> chapter2.scala:20)
>> 20/03/26 19:09:04 INFO DAGScheduler: Parents of final stage: List()
>> 20/03/26 19:09:04 INFO DAGScheduler: Missing parents: List()
>> 20/03/26 19:09:04 INFO DAGScheduler: Submitting ResultStage 1
>> (MapPartitionsRDD[5] at take at chapter2.scala:20), which has no missing
>> parents
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3 stored as values in
>> memory (estimated size 22.7 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3_piece0 stored as
>> bytes in memory (estimated size 8.1 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_3_piece0 in
>> memory on localhost:38135 (size: 8.1 KiB, free: 127.1 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 3 from broadcast
>> at DAGScheduler.scala:1206
>> 20/03/26 19:09:04 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 1 (MapPartitionsRDD[5] at take at chapter2.scala:20) (first 15
>> tasks are for partitions Vector(0))
>> 20/03/26 19:09:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
>> 20/03/26 19:09:04 INFO TaskSetManager: Starting task 0.0 in stage 1.0
>> (TID 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 7980 bytes)
>> 20/03/26 19:09:04 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
>> 20/03/26 19:09:05 INFO FilePartitionReader: Reading file path:
>> file:///data/flight-data/parquet/2010-summary.parquet/part-r-00000-1a9822ba-b8fb-4d8e-844a-ea30d0801b9e.gz.parquet,
>> range: 0-3921, partition values: [empty row]
>> 20/03/26 19:09:05 INFO ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 20/03/26 19:09:05 INFO CodecPool: Got brand-new decompressor [.gz]
>> 20/03/26 19:09:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1).
>> 1762 bytes result sent to driver
>> 20/03/26 19:09:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0
>> (TID 1) in 219 ms on localhost (executor driver) (1/1)
>> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose
>> tasks have all completed, from pool
>> 20/03/26 19:09:05 INFO DAGScheduler: ResultStage 1 (take at
>> chapter2.scala:20) finished in 0.235 s
>> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 is finished. Cancelling
>> potential speculative or zombie tasks for this job
>> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Killing all running tasks in
>> stage 1: Stage finished
>> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 finished: take at
>> chapter2.scala:20, took 0.238010 s
>> 20/03/26 19:09:05 INFO CodeGenerator: Code generated in 17.77886 ms
>> 20/03/26 19:09:05 INFO SparkUI: Stopped Spark web UI at
>> http://localhost:4041
>> 20/03/26 19:09:05 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>> 20/03/26 19:09:05 INFO MemoryStore: MemoryStore cleared
>> 20/03/26 19:09:05 INFO BlockManager: BlockManager stopped
>> 20/03/26 19:09:05 INFO BlockManagerMaster: BlockManagerMaster stopped
>> 20/03/26 19:09:05 INFO
>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>> OutputCommitCoordinator stopped!
>> 20/03/26 19:09:05 INFO SparkContext: Successfully stopped SparkContext
>> 20/03/26 19:09:05 INFO ShutdownHookManager: Shutdown hook called
>> 20/03/26 19:09:05 INFO ShutdownHookManager: Deleting directory
>> /tmp/spark-6d99677e-ae1b-4894-aa32-3a79fb0b4307
>>
>> Process finished with exit code 0
>>
>> <!----------------------------------------------------------------------------------------------------------
>>
>> import org.apache.spark.sql.SparkSession
>>
>> object chapter2 {
>>
>>   // define specific  data type class then manipulate it using the filter and map functions
>>   // this is also known as an Encoder
>>   case class flight (DEST_COUNTRY_NAME: String,
>>                      ORIGIN_COUNTRY_NAME:String,
>>                      count: BigInt)
>>
>>   def main(args: Array[String]): Unit = {
>>
>>     // using an inter active shell, spark session needed here to avoid Intellij errors
>>     val spark = SparkSession.builder.master("local[*]").appName(" chapter2").getOrCreate
>>
>>     // looks like a hard coded system work around
>>     import spark.implicits._
>>     val flightDf = spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
>>     val flights = flightDf.as <http://flightdf.as/>[flight]
>>     flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != "Canada").map(flight_row => flight_row).take(3)
>>
>>     spark.stop()
>>   }
>> }
>>
>>
>>
>>
>>
>>
>>
>> Backbutton.co.uk
>> ¯\_(ツ)_/¯
>> ♡۶Java♡۶RMI ♡۶
>> Make Use Method {MUM}
>> makeuse.org
>>
>
>

Re: results of taken(3) not appearing in console window

Posted by Zahid Rahman <za...@gmail.com>.
Thank you.

Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org
<http://www.backbutton.co.uk>


On Thu, 26 Mar 2020 at 19:18, Reynold Xin <rx...@databricks.com> wrote:

> bcc dev, +user
>
> You need to print out the result. Take itself doesn't print. You only got
> the results printed to the console because the Scala REPL automatically
> prints the returned value from take.
>
>
> On Thu, Mar 26, 2020 at 12:15 PM, Zahid Rahman <za...@gmail.com>
> wrote:
>
>> I am running the same code with the same libraries but not getting same
>> output.
>> scala>  case class flight (DEST_COUNTRY_NAME: String,
>>      |                      ORIGIN_COUNTRY_NAME:String,
>>      |                      count: BigInt)
>> defined class flight
>>
>> scala>     val flightDf = spark.read.parquet
>> ("/data/flight-data/parquet/2010-summary.parquet/")
>> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
>> ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala> val flights = flightDf.as <http://flightdf.as/>[flight]
>> flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME:
>> string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala> flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
>> "Canada").map(flight_row => flight_row).take(3)
>>
>> *res0: Array[flight] = Array(flight(United States,Romania,1),
>> flight(United States,Ireland,264), flight(United States,India,69))*
>>
>>
>> <!------------------------------------------------------------------------------------------------------------------------------
>> 20/03/26 19:09:00 INFO SparkContext: Running Spark version 3.0.0-preview2
>> 20/03/26 19:09:00 INFO ResourceUtils:
>> ==============================================================
>> 20/03/26 19:09:00 INFO ResourceUtils: Resources for spark.driver:
>>
>> 20/03/26 19:09:00 INFO ResourceUtils:
>> ==============================================================
>> 20/03/26 19:09:00 INFO SparkContext: Submitted application:  chapter2
>> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls to: kub19
>> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls to: kub19
>> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls groups to:
>> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls groups to:
>> 20/03/26 19:09:00 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users  with view permissions: Set(kub19);
>> groups with view permissions: Set(); users  with modify permissions:
>> Set(kub19); groups with modify permissions: Set()
>> 20/03/26 19:09:00 INFO Utils: Successfully started service 'sparkDriver'
>> on port 46817.
>> 20/03/26 19:09:00 INFO SparkEnv: Registering MapOutputTracker
>> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMaster
>> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint: Using
>> org.apache.spark.storage.DefaultTopologyMapper
>> <http://org.apache.spark.storage.defaulttopologymapper/> for getting
>> topology information
>> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint:
>> BlockManagerMasterEndpoint up
>> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>> 20/03/26 19:09:00 INFO DiskBlockManager: Created local directory at
>> /tmp/blockmgr-0baf5097-2595-4542-99e2-a192d7baf37c
>> 20/03/26 19:09:00 INFO MemoryStore: MemoryStore started with capacity
>> 127.2 MiB
>> 20/03/26 19:09:01 INFO SparkEnv: Registering OutputCommitCoordinator
>> 20/03/26 19:09:01 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> 20/03/26 19:09:01 INFO Utils: Successfully started service 'SparkUI' on
>> port 4041.
>> 20/03/26 19:09:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
>> http://localhost:4041
>> 20/03/26 19:09:01 INFO Executor: Starting executor ID driver on host
>> localhost
>> 20/03/26 19:09:01 INFO Utils: Successfully started service '
>> org.apache.spark.network.netty.NettyBlockTransferService
>> <http://org.apache.spark.network.netty.nettyblocktransferservice/>' on
>> port 38135.
>> 20/03/26 19:09:01 INFO NettyBlockTransferService: Server created on
>> localhost:38135
>> 20/03/26 19:09:01 INFO BlockManager: Using
>> org.apache.spark.storage.RandomBlockReplicationPolicy
>> <http://org.apache.spark.storage.randomblockreplicationpolicy/> for
>> block replication policy
>> 20/03/26 19:09:01 INFO BlockManagerMaster: Registering BlockManager
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManagerMasterEndpoint: Registering block
>> manager localhost:38135 with 127.2 MiB RAM, BlockManagerId(driver,
>> localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManagerMaster: Registered BlockManager
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManager: Initialized BlockManager:
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO SharedState: Setting hive.metastore.warehouse.dir
>> ('null') to the value of spark.sql.warehouse.dir
>> ('file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse').
>> 20/03/26 19:09:01 INFO SharedState: Warehouse path is
>> 'file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse'.
>> 20/03/26 19:09:02 INFO SparkContext: Starting job: parquet at
>> chapter2.scala:18
>> 20/03/26 19:09:02 INFO DAGScheduler: Got job 0 (parquet at
>> chapter2.scala:18) with 1 output partitions
>> 20/03/26 19:09:02 INFO DAGScheduler: Final stage: ResultStage 0 (parquet
>> at chapter2.scala:18)
>> 20/03/26 19:09:02 INFO DAGScheduler: Parents of final stage: List()
>> 20/03/26 19:09:02 INFO DAGScheduler: Missing parents: List()
>> 20/03/26 19:09:02 INFO DAGScheduler: Submitting ResultStage 0
>> (MapPartitionsRDD[1] at parquet at chapter2.scala:18), which has no missing
>> parents
>> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0 stored as values in
>> memory (estimated size 72.8 KiB, free 127.1 MiB)
>> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0_piece0 stored as
>> bytes in memory (estimated size 25.9 KiB, free 127.1 MiB)
>> 20/03/26 19:09:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>> memory on localhost:38135 (size: 25.9 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:02 INFO SparkContext: Created broadcast 0 from broadcast
>> at DAGScheduler.scala:1206
>> 20/03/26 19:09:02 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 0 (MapPartitionsRDD[1] at parquet at chapter2.scala:18) (first
>> 15 tasks are for partitions Vector(0))
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
>> 20/03/26 19:09:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0
>> (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7560 bytes)
>> 20/03/26 19:09:02 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>> 20/03/26 19:09:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
>> 1840 bytes result sent to driver
>> 20/03/26 19:09:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0
>> (TID 0) in 204 ms on localhost (executor driver) (1/1)
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose
>> tasks have all completed, from pool
>> 20/03/26 19:09:02 INFO DAGScheduler: ResultStage 0 (parquet at
>> chapter2.scala:18) finished in 0.304 s
>> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 is finished. Cancelling
>> potential speculative or zombie tasks for this job
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Killing all running tasks in
>> stage 0: Stage finished
>> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 finished: parquet at
>> chapter2.scala:18, took 0.332643 s
>> 20/03/26 19:09:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
>> localhost:38135 in memory (size: 25.9 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO V2ScanRelationPushDown:
>> Pushing operators to parquet
>> file:/data/flight-data/parquet/2010-summary.parquet
>> Pushed Filters:
>> Post-Scan Filters:
>> Output: DEST_COUNTRY_NAME#0, ORIGIN_COUNTRY_NAME#1, count#2L
>>
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1 stored as values in
>> memory (estimated size 290.0 KiB, free 126.9 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1_piece0 stored as
>> bytes in memory (estimated size 24.3 KiB, free 126.9 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in
>> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 1 from take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2 stored as values in
>> memory (estimated size 290.1 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2_piece0 stored as
>> bytes in memory (estimated size 24.3 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in
>> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 2 from take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO CodeGenerator: Code generated in 159.155401 ms
>> 20/03/26 19:09:04 INFO SparkContext: Starting job: take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO DAGScheduler: Got job 1 (take at
>> chapter2.scala:20) with 1 output partitions
>> 20/03/26 19:09:04 INFO DAGScheduler: Final stage: ResultStage 1 (take at
>> chapter2.scala:20)
>> 20/03/26 19:09:04 INFO DAGScheduler: Parents of final stage: List()
>> 20/03/26 19:09:04 INFO DAGScheduler: Missing parents: List()
>> 20/03/26 19:09:04 INFO DAGScheduler: Submitting ResultStage 1
>> (MapPartitionsRDD[5] at take at chapter2.scala:20), which has no missing
>> parents
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3 stored as values in
>> memory (estimated size 22.7 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3_piece0 stored as
>> bytes in memory (estimated size 8.1 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_3_piece0 in
>> memory on localhost:38135 (size: 8.1 KiB, free: 127.1 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 3 from broadcast
>> at DAGScheduler.scala:1206
>> 20/03/26 19:09:04 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 1 (MapPartitionsRDD[5] at take at chapter2.scala:20) (first 15
>> tasks are for partitions Vector(0))
>> 20/03/26 19:09:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
>> 20/03/26 19:09:04 INFO TaskSetManager: Starting task 0.0 in stage 1.0
>> (TID 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 7980 bytes)
>> 20/03/26 19:09:04 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
>> 20/03/26 19:09:05 INFO FilePartitionReader: Reading file path:
>> file:///data/flight-data/parquet/2010-summary.parquet/part-r-00000-1a9822ba-b8fb-4d8e-844a-ea30d0801b9e.gz.parquet,
>> range: 0-3921, partition values: [empty row]
>> 20/03/26 19:09:05 INFO ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 20/03/26 19:09:05 INFO CodecPool: Got brand-new decompressor [.gz]
>> 20/03/26 19:09:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1).
>> 1762 bytes result sent to driver
>> 20/03/26 19:09:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0
>> (TID 1) in 219 ms on localhost (executor driver) (1/1)
>> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose
>> tasks have all completed, from pool
>> 20/03/26 19:09:05 INFO DAGScheduler: ResultStage 1 (take at
>> chapter2.scala:20) finished in 0.235 s
>> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 is finished. Cancelling
>> potential speculative or zombie tasks for this job
>> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Killing all running tasks in
>> stage 1: Stage finished
>> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 finished: take at
>> chapter2.scala:20, took 0.238010 s
>> 20/03/26 19:09:05 INFO CodeGenerator: Code generated in 17.77886 ms
>> 20/03/26 19:09:05 INFO SparkUI: Stopped Spark web UI at
>> http://localhost:4041
>> 20/03/26 19:09:05 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>> 20/03/26 19:09:05 INFO MemoryStore: MemoryStore cleared
>> 20/03/26 19:09:05 INFO BlockManager: BlockManager stopped
>> 20/03/26 19:09:05 INFO BlockManagerMaster: BlockManagerMaster stopped
>> 20/03/26 19:09:05 INFO
>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>> OutputCommitCoordinator stopped!
>> 20/03/26 19:09:05 INFO SparkContext: Successfully stopped SparkContext
>> 20/03/26 19:09:05 INFO ShutdownHookManager: Shutdown hook called
>> 20/03/26 19:09:05 INFO ShutdownHookManager: Deleting directory
>> /tmp/spark-6d99677e-ae1b-4894-aa32-3a79fb0b4307
>>
>> Process finished with exit code 0
>>
>> <!----------------------------------------------------------------------------------------------------------
>>
>> import org.apache.spark.sql.SparkSession
>>
>> object chapter2 {
>>
>>   // define specific  data type class then manipulate it using the filter and map functions
>>   // this is also known as an Encoder
>>   case class flight (DEST_COUNTRY_NAME: String,
>>                      ORIGIN_COUNTRY_NAME:String,
>>                      count: BigInt)
>>
>>   def main(args: Array[String]): Unit = {
>>
>>     // using an inter active shell, spark session needed here to avoid Intellij errors
>>     val spark = SparkSession.builder.master("local[*]").appName(" chapter2").getOrCreate
>>
>>     // looks like a hard coded system work around
>>     import spark.implicits._
>>     val flightDf = spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
>>     val flights = flightDf.as <http://flightdf.as/>[flight]
>>     flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != "Canada").map(flight_row => flight_row).take(3)
>>
>>     spark.stop()
>>   }
>> }
>>
>>
>>
>>
>>
>>
>>
>> Backbutton.co.uk
>> ¯\_(ツ)_/¯
>> ♡۶Java♡۶RMI ♡۶
>> Make Use Method {MUM}
>> makeuse.org
>>
>
>