You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mahender Sarangam <Ma...@outlook.com> on 2016/06/09 20:11:24 UTC
how to store results of Scala Query in Text format or tab delimiter
Hi,
We are newbies learning spark. We are running Scala query against our
Parquet table. Whenever we fire query in Jupyter, results are shown in
page, Only part of results are shown in UI. So we are trying to store
the results into table which is Parquet format. By default, In Spark all
the tables are stored in parquet format. So the results of Query are
saved into parquet table, but we see performance impact because of
loading data into table/file. We were comparing time it takes to execute
same query in Spark v/s Hive. We see Spark is performing faster and
quicker whenever we don't store the results into File/Table. Whenever we
run the query and store results into parquet table, it takes more time
than Hive total execution. So to improve the export of data timing, we
would like to save the results in Plain text tab delimited or CSV
format. Is it possible today in spark. we are using 1.6 version of spark.
In hive, we have amabari to configure the hive server 2 settings. Is
there any UI for SPARK configuration. ?
One more difference we have identified is whenever Hive TEZ query is
executed, it is taking whole cluster available RAM memory for execution
. whereas in Spark it takes only 30% of available memory say 30 GB out
of 100 GB. Is it possible to increase memory usage, so that SPARK query
runs faster than Hive.
/Mahender