You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mahender Sarangam <Ma...@outlook.com> on 2016/06/09 20:11:24 UTC

how to store results of Scala Query in Text format or tab delimiter

Hi,

We are newbies learning spark. We are running Scala query against our 
Parquet table. Whenever we fire query in Jupyter, results are shown in 
page, Only part of results are shown in UI. So we are trying to store 
the results into table which is Parquet format. By default, In Spark all 
the tables are stored in parquet format.  So the results of Query are 
saved into parquet table, but we see performance impact because of 
loading data into table/file. We were comparing time it takes to execute 
same query in Spark v/s Hive. We see Spark is performing faster and 
quicker whenever we don't store the results into File/Table. Whenever we 
run the query and store results into parquet table, it takes more time 
than Hive total execution. So to improve the export of data timing, we 
would like to save the results in Plain text tab delimited or CSV 
format. Is it possible today in spark. we are using 1.6 version of spark.

In hive, we have amabari to configure the hive server 2 settings. Is 
there any UI for SPARK configuration. ?

One more difference we have identified is whenever Hive TEZ query is 
executed, it is taking whole cluster available RAM memory for execution 
. whereas in Spark it takes only 30% of available memory say 30 GB out 
of 100 GB. Is it possible to increase memory usage, so that SPARK query 
runs faster than Hive.

/Mahender