You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Dasun Hegoda <da...@icta.lk> on 2015/11/20 10:39:26 UTC
Hive on Spark - Hadoop 2 - Installation - Ubuntu
Hi,
What I'm planning to do is develop a reporting platform using existing data. I have an existing RDBMS which has large number of records. So I'm using. ( http://stackoverflow.com/ questions/33635234/hadoop-2-7- spark-hive-jasperreports- scoop-architecuture )
- Scoop - Extract data from RDBMS to Hadoop
- Hadoop - Storage platform -> *Deployment Completed*
- Hive - Datawarehouse
- Spark - Read time processing -> *Deployment Completed*
I'm planning to deploy Hive on Spark but I can't find the installation steps. I tried to read the official '[Hive on Spark][1]' guide but it has problems. As an example it says under 'Configuring Yarn' `yarn.resourcemanager. scheduler.class=org.apache. hadoop.yarn.server. resourcemanager.scheduler. fair.FairScheduler` but does not imply where should I do it. Also as per the guide configurations are set in the Hive runtime shell which is not permanent according to my knowledge.
Given that I read [this][2] but it does not have any steps.
Please provide me the steps to run Hive on Spark on Ubuntu as a production system?
[1]: https://cwiki.apache.org/ confluence/display/Hive/Hive+ on+Spark%3A+Getting+Started
[2]: http://stackoverflow.com/ questions/26018306/how-to- configure-hive-to-use-spark
Regards,
Dasun Hegoda
Senior Software Engineer @ ICTA
dasunhegoda.com