You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com on 2015/03/27 07:34:31 UTC

Can spark sql read existing tables created in hive

I have few tables that are created in Hive. I wan to transform data stored
in these Hive tables using Spark SQL. Is this even possible ?

So far i have seen that i can create new tables using Spark SQL dialect.
However when i run show tables or do desc hive_table it says table not
found.

I am now wondering is this support present or not in Spark SQL ?

-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com.

I have raised a JIRA - https://issues.apache.org/jira/browse/SPARK-6622 .
In order to track this issue and possibly if it requires a fix from Spark

On Tue, Mar 31, 2015 at 9:31 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com> wrote:

> Hello Lian,
> This blog talks about how to install Hive meta store. I thing that i took
> from it was the mysql-connector-java jar that needs to be used and it
> suggests 5.1.35 (mysql-connector-java-5.1.35-bin.jar
> ).
>
> When i use that.
>
> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
> /apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
> --jars /apache/hadoop/lib/hadoop-lzo-0.6.0.jar,
> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.35-bin.jar*,/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/conf/hive-site.xml
> --num-executors 1 --driver-memory 4g --driver-java-options
> "-XX:MaxPermSize=2G" --executor-memory 2g --executor-cores 1 --queue
> hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp
> spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16 endDate=2015-02-16
> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
>
> I still get the same error.
>
>
> org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a
> test connection to the given database. JDBC url = jdbc:mysql://
> hostname.vip.company.com:3306/HDB, username = hiveuser. Terminating
> connection pool (set lazyInit to true if you expect to start your database
> after your app). Original Exception: ------
>
> java.sql.SQLException: No suitable driver found for
> jdbc:mysql://hostname.vip. company.com:3306/HDB
>
> at java.sql.DriverManager.getConnection(DriverManager.java:596)
>
> Attached is the full stack trace & logs, if it can reveal some insights.
>
> Michael,
> Could you please take time and look into it.
>
> Regards,
> Deepak
>
>
> On Mon, Mar 30, 2015 at 10:04 PM, Cheng Lian <li...@gmail.com>
> wrote:
>
>>  Ah, sorry, my bad...
>> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html
>>
>>
>> On 3/30/15 10:24 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>>
>>  Hello Lian
>> Can you share the URL ?
>>
>> On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian <li...@gmail.com>
>> wrote:
>>
>>>  The "mysql" command line doesn't use JDBC to talk to MySQL server, so
>>> this doesn't verify anything.
>>>
>>> I think this Hive metastore installation guide from Cloudera may be
>>> helpful. Although this document is for CDH4, the general steps are the
>>> same, and should help you to figure out the relationships here.
>>>
>>> Cheng
>>>
>>>
>>> On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>>>
>>>  I am able to connect to MySQL Hive metastore from the client cluster
>>> machine.
>>>
>>>  -sh-4.1$ mysql --user=hiveuser --password=pass --host=
>>> hostname.vip.company.com
>>> Welcome to the MySQL monitor.  Commands end with ; or \g.
>>> Your MySQL connection id is 9417286
>>> Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
>>> Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights
>>> reserved.
>>>  Oracle is a registered trademark of Oracle Corporation and/or its
>>>  affiliates. Other names may be trademarks of their respective
>>> owners.
>>> Type 'help;' or '\h' for help. Type '\c' to clear the current input
>>> statement.
>>>  mysql> use eBayHDB;
>>>  Reading table information for completion of table and column names
>>> You can turn off this feature to get a quicker startup with -A
>>>
>>>  Database changed
>>> mysql> show tables;
>>> +---------------------------+
>>> | Tables_in_HDB         |
>>>
>>>  +---------------------------+
>>>
>>>
>>>  Regards,
>>> Deepak
>>>
>>>
>>> On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>> wrote:
>>>
>>>> Yes am using yarn-cluster and i did add it via --files. I get "Suitable
>>>> error not found error"
>>>>
>>>>  Please share the spark-submit command that shows mysql jar containing
>>>> driver class used to connect to Hive MySQL meta store.
>>>>
>>>>  Even after including it through
>>>>
>>>>   --driver-class-path
>>>> /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>>>  OR (AND)
>>>>  --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>>>
>>>>  I keep getting "Suitable driver not found for"
>>>>
>>>>
>>>>  Command
>>>> ========
>>>>
>>>> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
>>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
>>>> --jars
>>>> /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,
>>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files
>>>> $SPARK_HOME/conf/hive-site.xml  --num-executors 1 --driver-memory 4g
>>>> --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g
>>>> --executor-cores 1 --queue hdmi-express --class
>>>> com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar
>>>> startDate=2015-02-16 endDate=2015-02-16
>>>> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
>>>> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
>>>>  Logs
>>>> ====
>>>>
>>>>  Caused by: java.sql.SQLException: No suitable driver found for
>>>> jdbc:mysql://hostname:3306/HDB
>>>>  at java.sql.DriverManager.getConnection(DriverManager.java:596)
>>>>  at java.sql.DriverManager.getConnection(DriverManager.java:187)
>>>>  at
>>>> com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
>>>>  at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
>>>>  ... 68 more
>>>>  ...
>>>> ...
>>>>
>>>> 15/03/27 23:56:08 INFO yarn.Client: Uploading resource
>>>> file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
>>>> hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
>>>>
>>>> ...
>>>>
>>>> ...
>>>>
>>>>
>>>>
>>>>  -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
>>>>     61 Fri Oct 17 08:05:36 GMT-07:00 2014
>>>> META-INF/services/java.sql.Driver
>>>>   3396 Fri Oct 17 08:05:22 GMT-07:00 2014
>>>> com/mysql/fabric/jdbc/FabricMySQLDriver.class
>>>> *   692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class*
>>>>   1562 Fri Oct 17 08:05:20 GMT-07:00 2014
>>>> com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
>>>>  17817 Fri Oct 17 08:05:20 GMT-07:00 2014
>>>> com/mysql/jdbc/NonRegisteringDriver.class
>>>>    690 Fri Oct 17 08:05:24 GMT-07:00 2014
>>>> com/mysql/jdbc/NonRegisteringReplicationDriver.class
>>>>    731 Fri Oct 17 08:05:24 GMT-07:00 2014
>>>> com/mysql/jdbc/ReplicationDriver.class
>>>>    336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class
>>>> You have new mail in /var/spool/mail/dvasthimal
>>>> -sh-4.1$ cat conf/hive-site.xml | grep Driver
>>>>    <name>javax.jdo.option.ConnectionDriverName</name>
>>>> *  <value>com.mysql.jdbc.Driver</value>*
>>>>   <description>Driver class name for a JDBC metastore</description>
>>>>  -sh-4.1$
>>>>
>>>>  --
>>>>  Deepak
>>>>
>>>>
>>>> On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>>> Are you running on yarn?
>>>>>
>>>>>   - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
>>>>> /etc/hive/conf/ (or the directory where your hive-site.xml is located).
>>>>>  - If you are running in yarn-cluster mode, the easiest thing to do is
>>>>> to add--files=/etc/hive/conf/hive-site.xml (or the path for your
>>>>> hive-site.xml) to your spark-submit script.
>>>>>
>>>>> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I can recreate tables but what about data. It looks like this is a
>>>>>> obvious feature that Spark SQL must be having. People will want to
>>>>>> transform tons of data stored in HDFS through Hive from Spark SQL.
>>>>>>
>>>>>>  Spark programming guide suggests its possible.
>>>>>>
>>>>>>
>>>>>>  Spark SQL also supports reading and writing data stored in Apache
>>>>>> Hive <http://hive.apache.org/>.  .... Configuration of Hive is done
>>>>>> by placing your hive-site.xml file in conf/.
>>>>>>
>>>>>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>>>>>
>>>>>>  For some reason its not working.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
>>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>>
>>>>>>>  Seems Spark SQL accesses some more columns apart from those
>>>>>>> created by hive.
>>>>>>>
>>>>>>>  You can always recreate the tables, you would need to execute the
>>>>>>> table creation scripts but it would be good to avoid recreation.
>>>>>>>
>>>>>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I did copy hive-conf.xml form Hive installation into
>>>>>>>> spark-home/conf. IT does have all the meta store connection details, host,
>>>>>>>> username, passwd, driver and others.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Snippet
>>>>>>>> ======
>>>>>>>>
>>>>>>>>
>>>>>>>>  <configuration>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>>>>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>>>>>>   <value>com.mysql.jdbc.Driver</value>
>>>>>>>>   <description>Driver class name for a JDBC metastore</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>>>>>>   <value>hiveuser</value>
>>>>>>>>   <description>username to use against metastore
>>>>>>>> database</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>>>>>>   <value>some-password</value>
>>>>>>>>   <description>password to use against metastore
>>>>>>>> database</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>hive.metastore.local</name>
>>>>>>>>   <value>false</value>
>>>>>>>>   <description>controls whether to connect to remove metastore
>>>>>>>> server or open a new metastore server in Hive Client JVM</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>hive.metastore.warehouse.dir</name>
>>>>>>>>   <value>/user/hive/warehouse</value>
>>>>>>>>   <description>location of default database for the
>>>>>>>> warehouse</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  ......
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  When i attempt to read hive table, it does not work. dw_bid does
>>>>>>>> not exists.
>>>>>>>>
>>>>>>>>  I am sure there is a way to read tables stored in HDFS (Hive)
>>>>>>>> from Spark SQL. Otherwise how would anyone do analytics since the source
>>>>>>>> tables are always either persisted directly on HDFS or through Hive.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>>>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>>>>
>>>>>>>>> Since hive and spark SQL internally use HDFS and Hive metastore.
>>>>>>>>> The only thing you want to change is the processing engine. You can try to
>>>>>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>>>>>>> the hive site xml captures the metastore connection details).
>>>>>>>>>
>>>>>>>>>  Its a hack,  i havnt tried it. I have played around with the
>>>>>>>>> metastore and it should work.
>>>>>>>>>
>>>>>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <
>>>>>>>>> deepujain@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I have few tables that are created in Hive. I wan to transform
>>>>>>>>>> data stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>>>>>>
>>>>>>>>>>  So far i have seen that i can create new tables using Spark SQL
>>>>>>>>>> dialect. However when i run show tables or do desc hive_table it says table
>>>>>>>>>> not found.
>>>>>>>>>>
>>>>>>>>>>  I am now wondering is this support present or not in Spark SQL ?
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>>>  Deepak
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   --
>>>>>>>>>
>>>>>>>>> [image: Sigmoid Analytics]
>>>>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>>>>
>>>>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>>>>
>>>>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   --
>>>>>>>>  Deepak
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>>
>>>>>>> [image: Sigmoid Analytics]
>>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>>
>>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>>
>>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   --
>>>>>>  Deepak
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>   --
>>>>  Deepak
>>>>
>>>>
>>>
>>>
>>>  --
>>>  Deepak
>>>
>>>
>>>
>>
>>
>>  --
>>  Deepak
>>
>>
>>
>
>
> --
> Deepak
>
>


-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com.

Hello Lian,
This blog talks about how to install Hive meta store. I thing that i took
from it was the mysql-connector-java jar that needs to be used and it
suggests 5.1.35 (mysql-connector-java-5.1.35-bin.jar
).

When i use that.

./bin/spark-submit -v --master yarn-cluster --driver-class-path
/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
--jars /apache/hadoop/lib/hadoop-lzo-0.6.0.jar,
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.35-bin.jar*,/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/conf/hive-site.xml
--num-executors 1 --driver-memory 4g --driver-java-options
"-XX:MaxPermSize=2G" --executor-memory 2g --executor-cores 1 --queue
hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp
spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16 endDate=2015-02-16
input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2

I still get the same error.


org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a test
connection to the given database. JDBC url = jdbc:mysql://
hostname.vip.company.com:3306/HDB, username = hiveuser. Terminating
connection pool (set lazyInit to true if you expect to start your database
after your app). Original Exception: ------

java.sql.SQLException: No suitable driver found for
jdbc:mysql://hostname.vip. company.com:3306/HDB

at java.sql.DriverManager.getConnection(DriverManager.java:596)

Attached is the full stack trace & logs, if it can reveal some insights.

Michael,
Could you please take time and look into it.

Regards,
Deepak


On Mon, Mar 30, 2015 at 10:04 PM, Cheng Lian <li...@gmail.com> wrote:

>  Ah, sorry, my bad...
> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html
>
>
> On 3/30/15 10:24 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>
>  Hello Lian
> Can you share the URL ?
>
> On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian <li...@gmail.com> wrote:
>
>>  The "mysql" command line doesn't use JDBC to talk to MySQL server, so
>> this doesn't verify anything.
>>
>> I think this Hive metastore installation guide from Cloudera may be
>> helpful. Although this document is for CDH4, the general steps are the
>> same, and should help you to figure out the relationships here.
>>
>> Cheng
>>
>>
>> On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>>
>>  I am able to connect to MySQL Hive metastore from the client cluster
>> machine.
>>
>>  -sh-4.1$ mysql --user=hiveuser --password=pass --host=
>> hostname.vip.company.com
>> Welcome to the MySQL monitor.  Commands end with ; or \g.
>> Your MySQL connection id is 9417286
>> Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
>> Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights
>> reserved.
>>  Oracle is a registered trademark of Oracle Corporation and/or its
>>  affiliates. Other names may be trademarks of their respective
>> owners.
>> Type 'help;' or '\h' for help. Type '\c' to clear the current input
>> statement.
>>  mysql> use eBayHDB;
>>  Reading table information for completion of table and column names
>> You can turn off this feature to get a quicker startup with -A
>>
>>  Database changed
>> mysql> show tables;
>> +---------------------------+
>> | Tables_in_HDB         |
>>
>>  +---------------------------+
>>
>>
>>  Regards,
>> Deepak
>>
>>
>> On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>> wrote:
>>
>>> Yes am using yarn-cluster and i did add it via --files. I get "Suitable
>>> error not found error"
>>>
>>>  Please share the spark-submit command that shows mysql jar containing
>>> driver class used to connect to Hive MySQL meta store.
>>>
>>>  Even after including it through
>>>
>>>   --driver-class-path
>>> /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>>  OR (AND)
>>>  --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>>
>>>  I keep getting "Suitable driver not found for"
>>>
>>>
>>>  Command
>>> ========
>>>
>>> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
>>> --jars
>>> /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,
>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files
>>> $SPARK_HOME/conf/hive-site.xml  --num-executors 1 --driver-memory 4g
>>> --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g
>>> --executor-cores 1 --queue hdmi-express --class
>>> com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar
>>> startDate=2015-02-16 endDate=2015-02-16
>>> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
>>> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
>>>  Logs
>>> ====
>>>
>>>  Caused by: java.sql.SQLException: No suitable driver found for
>>> jdbc:mysql://hostname:3306/HDB
>>>  at java.sql.DriverManager.getConnection(DriverManager.java:596)
>>>  at java.sql.DriverManager.getConnection(DriverManager.java:187)
>>>  at
>>> com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
>>>  at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
>>>  ... 68 more
>>>  ...
>>> ...
>>>
>>> 15/03/27 23:56:08 INFO yarn.Client: Uploading resource
>>> file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
>>> hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
>>>
>>> ...
>>>
>>> ...
>>>
>>>
>>>
>>>  -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
>>>     61 Fri Oct 17 08:05:36 GMT-07:00 2014
>>> META-INF/services/java.sql.Driver
>>>   3396 Fri Oct 17 08:05:22 GMT-07:00 2014
>>> com/mysql/fabric/jdbc/FabricMySQLDriver.class
>>> *   692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class*
>>>   1562 Fri Oct 17 08:05:20 GMT-07:00 2014
>>> com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
>>>  17817 Fri Oct 17 08:05:20 GMT-07:00 2014
>>> com/mysql/jdbc/NonRegisteringDriver.class
>>>    690 Fri Oct 17 08:05:24 GMT-07:00 2014
>>> com/mysql/jdbc/NonRegisteringReplicationDriver.class
>>>    731 Fri Oct 17 08:05:24 GMT-07:00 2014
>>> com/mysql/jdbc/ReplicationDriver.class
>>>    336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class
>>> You have new mail in /var/spool/mail/dvasthimal
>>> -sh-4.1$ cat conf/hive-site.xml | grep Driver
>>>    <name>javax.jdo.option.ConnectionDriverName</name>
>>> *  <value>com.mysql.jdbc.Driver</value>*
>>>   <description>Driver class name for a JDBC metastore</description>
>>>  -sh-4.1$
>>>
>>>  --
>>>  Deepak
>>>
>>>
>>> On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <
>>> michael@databricks.com> wrote:
>>>
>>>> Are you running on yarn?
>>>>
>>>>   - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
>>>> /etc/hive/conf/ (or the directory where your hive-site.xml is located).
>>>>  - If you are running in yarn-cluster mode, the easiest thing to do is
>>>> to add--files=/etc/hive/conf/hive-site.xml (or the path for your
>>>> hive-site.xml) to your spark-submit script.
>>>>
>>>> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> I can recreate tables but what about data. It looks like this is a
>>>>> obvious feature that Spark SQL must be having. People will want to
>>>>> transform tons of data stored in HDFS through Hive from Spark SQL.
>>>>>
>>>>>  Spark programming guide suggests its possible.
>>>>>
>>>>>
>>>>>  Spark SQL also supports reading and writing data stored in Apache
>>>>> Hive <http://hive.apache.org/>.  .... Configuration of Hive is done
>>>>> by placing your hive-site.xml file in conf/.
>>>>>
>>>>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>>>>
>>>>>  For some reason its not working.
>>>>>
>>>>>
>>>>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>
>>>>>>  Seems Spark SQL accesses some more columns apart from those created
>>>>>> by hive.
>>>>>>
>>>>>>  You can always recreate the tables, you would need to execute the
>>>>>> table creation scripts but it would be good to avoid recreation.
>>>>>>
>>>>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I did copy hive-conf.xml form Hive installation into
>>>>>>> spark-home/conf. IT does have all the meta store connection details, host,
>>>>>>> username, passwd, driver and others.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Snippet
>>>>>>> ======
>>>>>>>
>>>>>>>
>>>>>>>  <configuration>
>>>>>>>
>>>>>>>  <property>
>>>>>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>>>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>>>>>> </property>
>>>>>>>
>>>>>>>  <property>
>>>>>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>>>>>   <value>com.mysql.jdbc.Driver</value>
>>>>>>>   <description>Driver class name for a JDBC metastore</description>
>>>>>>> </property>
>>>>>>>
>>>>>>>  <property>
>>>>>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>>>>>   <value>hiveuser</value>
>>>>>>>   <description>username to use against metastore
>>>>>>> database</description>
>>>>>>> </property>
>>>>>>>
>>>>>>>  <property>
>>>>>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>>>>>   <value>some-password</value>
>>>>>>>   <description>password to use against metastore
>>>>>>> database</description>
>>>>>>> </property>
>>>>>>>
>>>>>>>  <property>
>>>>>>>   <name>hive.metastore.local</name>
>>>>>>>   <value>false</value>
>>>>>>>   <description>controls whether to connect to remove metastore
>>>>>>> server or open a new metastore server in Hive Client JVM</description>
>>>>>>> </property>
>>>>>>>
>>>>>>>  <property>
>>>>>>>   <name>hive.metastore.warehouse.dir</name>
>>>>>>>   <value>/user/hive/warehouse</value>
>>>>>>>   <description>location of default database for the
>>>>>>> warehouse</description>
>>>>>>> </property>
>>>>>>>
>>>>>>>  ......
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  When i attempt to read hive table, it does not work. dw_bid does
>>>>>>> not exists.
>>>>>>>
>>>>>>>  I am sure there is a way to read tables stored in HDFS (Hive) from
>>>>>>> Spark SQL. Otherwise how would anyone do analytics since the source tables
>>>>>>> are always either persisted directly on HDFS or through Hive.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>>>
>>>>>>>> Since hive and spark SQL internally use HDFS and Hive metastore.
>>>>>>>> The only thing you want to change is the processing engine. You can try to
>>>>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>>>>>> the hive site xml captures the metastore connection details).
>>>>>>>>
>>>>>>>>  Its a hack,  i havnt tried it. I have played around with the
>>>>>>>> metastore and it should work.
>>>>>>>>
>>>>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <
>>>>>>>> deepujain@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I have few tables that are created in Hive. I wan to transform
>>>>>>>>> data stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>>>>>
>>>>>>>>>  So far i have seen that i can create new tables using Spark SQL
>>>>>>>>> dialect. However when i run show tables or do desc hive_table it says table
>>>>>>>>> not found.
>>>>>>>>>
>>>>>>>>>  I am now wondering is this support present or not in Spark SQL ?
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>>  Deepak
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   --
>>>>>>>>
>>>>>>>> [image: Sigmoid Analytics]
>>>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>>>
>>>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>>>
>>>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   --
>>>>>>>  Deepak
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>>
>>>>>> [image: Sigmoid Analytics]
>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>
>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>
>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   --
>>>>>  Deepak
>>>>>
>>>>>
>>>>
>>>
>>>
>>>   --
>>>  Deepak
>>>
>>>
>>
>>
>>  --
>>  Deepak
>>
>>
>>
>
>
>  --
>  Deepak
>
>
>


-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by Cheng Lian <li...@gmail.com>.

Ah, sorry, my bad... 
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

On 3/30/15 10:24 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> Hello Lian
> Can you share the URL ?
>
> On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian <lian.cs.zju@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     The "mysql" command line doesn't use JDBC to talk to MySQL server,
>     so this doesn't verify anything.
>
>     I think this Hive metastore installation guide from Cloudera may
>     be helpful. Although this document is for CDH4, the general steps
>     are the same, and should help you to figure out the relationships
>     here.
>
>     Cheng
>
>
>     On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>>     I am able to connect to MySQL Hive metastore from the client
>>     cluster machine.
>>
>>     -sh-4.1$ mysql --user=hiveuser --password=pass
>>     --host=hostname.vip.company.com <http://hostname.vip.company.com>
>>     Welcome to the MySQL monitor.  Commands end with ; or \g.
>>     Your MySQL connection id is 9417286
>>     Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
>>     Copyright (c) 2000, 2011, Oracle and/or its affiliates. All
>>     rights reserved.
>>     Oracle is a registered trademark of Oracle Corporation and/or its
>>     affiliates. Other names may be trademarks of their respective
>>     owners.
>>     Type 'help;' or '\h' for help. Type '\c' to clear the current
>>     input statement.
>>     mysql> use eBayHDB;
>>     Reading table information for completion of table and column names
>>     You can turn off this feature to get a quicker startup with -A
>>
>>     Database changed
>>     mysql> show tables;
>>     +---------------------------+
>>     | Tables_in_HDB         |
>>
>>     +---------------------------+
>>
>>
>>     Regards,
>>     Deepak
>>
>>
>>     On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
>>     <deepujain@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Yes am using yarn-cluster and i did add it via --files. I get
>>         "Suitable error not found error"
>>
>>         Please share the spark-submit command that shows mysql jar
>>         containing driver class used to connect to Hive MySQL meta
>>         store.
>>
>>         Even after including it through
>>
>>          --driver-class-path
>>         /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>         OR (AND)
>>          --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>
>>         I keep getting "Suitable driver not found for"
>>
>>
>>         Command
>>         ========
>>
>>         ./bin/spark-submit -v --master yarn-cluster
>>         --driver-class-path
>>         */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
>>         --jars
>>         /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r
>>         --files $SPARK_HOME/conf/hive-site.xml  --num-executors 1
>>         --driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G"
>>         --executor-memory 2g --executor-cores 1 --queue hdmi-express
>>         --class com.ebay.ep.poc.spark.reporting.SparkApp
>>         spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16
>>         endDate=2015-02-16
>>         input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
>>         subcommand=successevents2
>>         output=/user/dvasthimal/epdatasets/successdetail2
>>
>>         Logs
>>         ====
>>
>>         Caused by: java.sql.SQLException: No suitable driver found
>>         for jdbc:mysql://hostname:3306/HDB
>>         at java.sql.DriverManager.getConnection(DriverManager.java:596)
>>         at java.sql.DriverManager.getConnection(DriverManager.java:187)
>>         at
>>         com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
>>         at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
>>         ... 68 more
>>         ...
>>         ...
>>
>>         15/03/27 23:56:08 INFO yarn.Client: Uploading resource
>>         file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
>>         hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
>>
>>         ...
>>
>>         ...
>>
>>
>>
>>
>>         -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep
>>         Driver
>>             61 Fri Oct 17 08:05:36 GMT-07:00 2014
>>         META-INF/services/java.sql.Driver
>>           3396 Fri Oct 17 08:05:22 GMT-07:00 2014
>>         com/mysql/fabric/jdbc/FabricMySQLDriver.class
>>         *   692 Fri Oct 17 08:05:22 GMT-07:00 2014
>>         com/mysql/jdbc/Driver.class*
>>           1562 Fri Oct 17 08:05:20 GMT-07:00 2014
>>         com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
>>          17817 Fri Oct 17 08:05:20 GMT-07:00 2014
>>         com/mysql/jdbc/NonRegisteringDriver.class
>>            690 Fri Oct 17 08:05:24 GMT-07:00 2014
>>         com/mysql/jdbc/NonRegisteringReplicationDriver.class
>>            731 Fri Oct 17 08:05:24 GMT-07:00 2014
>>         com/mysql/jdbc/ReplicationDriver.class
>>            336 Fri Oct 17 08:05:24 GMT-07:00 2014
>>         org/gjt/mm/mysql/Driver.class
>>         You have new mail in /var/spool/mail/dvasthimal
>>         -sh-4.1$ cat conf/hive-site.xml | grep Driver
>>         <name>javax.jdo.option.ConnectionDriverName</name>
>>         *<value>com.mysql.jdbc.Driver</value>*
>>           <description>Driver class name for a JDBC
>>         metastore</description>
>>         -sh-4.1$
>>
>>         -- 
>>         Deepak
>>
>>
>>         On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust
>>         <michael@databricks.com <ma...@databricks.com>> wrote:
>>
>>             Are you running on yarn?
>>
>>              - If you are running in yarn-client mode, set
>>             HADOOP_CONF_DIR to /etc/hive/conf/ (or the directory
>>             where your hive-site.xml is located).
>>              - If you are running in yarn-cluster mode, the easiest
>>             thing to do is to add--files=/etc/hive/conf/hive-site.xml
>>             (or the path for your hive-site.xml) to your spark-submit
>>             script.
>>
>>             On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
>>             <deepujain@gmail.com <ma...@gmail.com>> wrote:
>>
>>                 I can recreate tables but what about data. It looks
>>                 like this is a obvious feature that Spark SQL must be
>>                 having. People will want to transform tons of data
>>                 stored in HDFS through Hive from Spark SQL.
>>
>>                 Spark programming guide suggests its possible.
>>
>>
>>                 Spark SQL also supports reading and writing data
>>                 stored in Apache Hive <http://hive.apache.org/>. ....
>>                 Configuration of Hive is done by placing your
>>                 |hive-site.xml| file in |conf/|.
>>                 https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>
>>                 For some reason its not working.
>>
>>
>>                 On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda
>>                 <arush@sigmoidanalytics.com
>>                 <ma...@sigmoidanalytics.com>> wrote:
>>
>>                     Seems Spark SQL accesses some more columns apart
>>                     from those created by hive.
>>
>>                     You can always recreate the tables, you would
>>                     need to execute the table creation scripts but it
>>                     would be good to avoid recreation.
>>
>>                     On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
>>                     <deepujain@gmail.com
>>                     <ma...@gmail.com>> wrote:
>>
>>                         I did copy hive-conf.xml form Hive
>>                         installation into spark-home/conf. IT does
>>                         have all the meta store connection details,
>>                         host, username, passwd, driver and others.
>>
>>
>>
>>                         Snippet
>>                         ======
>>
>>
>>                         <configuration>
>>
>>                         <property>
>>                         <name>javax.jdo.option.ConnectionURL</name>
>>                         <value>jdbc:mysql://host.vip.company.com:3306/HDB
>>                         <http://host.vip.company.com:3306/HDB></value>
>>                         </property>
>>
>>                         <property>
>>                         <name>javax.jdo.option.ConnectionDriverName</name>
>>                         <value>com.mysql.jdbc.Driver</value>
>>                         <description>Driver class name for a JDBC
>>                         metastore</description>
>>                         </property>
>>
>>                         <property>
>>                         <name>javax.jdo.option.ConnectionUserName</name>
>>                         <value>hiveuser</value>
>>                         <description>username to use against
>>                         metastore database</description>
>>                         </property>
>>
>>                         <property>
>>                         <name>javax.jdo.option.ConnectionPassword</name>
>>                         <value>some-password</value>
>>                         <description>password to use against
>>                         metastore database</description>
>>                         </property>
>>
>>                         <property>
>>                         <name>hive.metastore.local</name>
>>                         <value>false</value>
>>                         <description>controls whether to connect to
>>                         remove metastore server or open a new
>>                         metastore server in Hive Client JVM</description>
>>                         </property>
>>
>>                         <property>
>>                         <name>hive.metastore.warehouse.dir</name>
>>                         <value>/user/hive/warehouse</value>
>>                         <description>location of default database for
>>                         the warehouse</description>
>>                         </property>
>>
>>                         ......
>>
>>
>>
>>                         When i attempt to read hive table, it does
>>                         not work. dw_bid does not exists.
>>
>>                         I am sure there is a way to read tables
>>                         stored in HDFS (Hive) from Spark SQL.
>>                         Otherwise how would anyone do analytics since
>>                         the source tables are always either persisted
>>                         directly on HDFS or through Hive.
>>
>>
>>                         On Fri, Mar 27, 2015 at 1:15 PM, Arush
>>                         Kharbanda <arush@sigmoidanalytics.com
>>                         <ma...@sigmoidanalytics.com>> wrote:
>>
>>                             Since hive and spark SQL internally use
>>                             HDFS and Hive metastore. The only thing
>>                             you want to change is the processing
>>                             engine. You can try to bring your
>>                             hive-site.xml to
>>                             %SPARK_HOME%/conf/hive-site.xml.(Ensure
>>                             that the hive site xml captures the
>>                             metastore connection details).
>>
>>                             Its a hack,  i havnt tried it. I have
>>                             played around with the metastore and it
>>                             should work.
>>
>>                             On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ
>>                             (๏̯͡๏) <deepujain@gmail.com
>>                             <ma...@gmail.com>> wrote:
>>
>>                                 I have few tables that are created in
>>                                 Hive. I wan to transform data stored
>>                                 in these Hive tables using Spark SQL.
>>                                 Is this even possible ?
>>
>>                                 So far i have seen that i can create
>>                                 new tables using Spark SQL dialect.
>>                                 However when i run show tables or do
>>                                 desc hive_table it says table not found.
>>
>>                                 I am now wondering is this support
>>                                 present or not in Spark SQL ?
>>
>>                                 -- 
>>                                 Deepak
>>
>>
>>
>>
>>                             -- 
>>
>>                             Sigmoid Analytics
>>                             <http://htmlsig.com/www.sigmoidanalytics.com>
>>
>>                             *Arush Kharbanda* || Technical Teamlead
>>
>>                             arush@sigmoidanalytics.com
>>                             <ma...@sigmoidanalytics.com> ||
>>                             www.sigmoidanalytics.com
>>                             <http://www.sigmoidanalytics.com/>
>>
>>
>>
>>
>>                         -- 
>>                         Deepak
>>
>>
>>
>>
>>                     -- 
>>
>>                     Sigmoid Analytics
>>                     <http://htmlsig.com/www.sigmoidanalytics.com>
>>
>>                     *Arush Kharbanda* || Technical Teamlead
>>
>>                     arush@sigmoidanalytics.com
>>                     <ma...@sigmoidanalytics.com> ||
>>                     www.sigmoidanalytics.com
>>                     <http://www.sigmoidanalytics.com/>
>>
>>
>>
>>
>>                 -- 
>>                 Deepak
>>
>>
>>
>>
>>
>>         -- 
>>         Deepak
>>
>>
>>
>>
>>     -- 
>>     Deepak
>>
>
>
>
>
> -- 
> Deepak
>

Re: Can spark sql read existing tables created in hive

Posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com.

Hello Lian
Can you share the URL ?

On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian <li...@gmail.com> wrote:

>  The "mysql" command line doesn't use JDBC to talk to MySQL server, so
> this doesn't verify anything.
>
> I think this Hive metastore installation guide from Cloudera may be
> helpful. Although this document is for CDH4, the general steps are the
> same, and should help you to figure out the relationships here.
>
> Cheng
>
>
> On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>
>  I am able to connect to MySQL Hive metastore from the client cluster
> machine.
>
>  -sh-4.1$ mysql --user=hiveuser --password=pass --host=
> hostname.vip.company.com
> Welcome to the MySQL monitor.  Commands end with ; or \g.
> Your MySQL connection id is 9417286
> Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
> Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights
> reserved.
>  Oracle is a registered trademark of Oracle Corporation and/or its
>  affiliates. Other names may be trademarks of their respective
> owners.
> Type 'help;' or '\h' for help. Type '\c' to clear the current input
> statement.
>  mysql> use eBayHDB;
>  Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
>
>  Database changed
> mysql> show tables;
> +---------------------------+
> | Tables_in_HDB         |
>
>  +---------------------------+
>
>
>  Regards,
> Deepak
>
>
> On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
> wrote:
>
>> Yes am using yarn-cluster and i did add it via --files. I get "Suitable
>> error not found error"
>>
>>  Please share the spark-submit command that shows mysql jar containing
>> driver class used to connect to Hive MySQL meta store.
>>
>>  Even after including it through
>>
>>   --driver-class-path
>> /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>  OR (AND)
>>  --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>
>>  I keep getting "Suitable driver not found for"
>>
>>
>>  Command
>> ========
>>
>> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
>> --jars
>> /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,
>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files
>> $SPARK_HOME/conf/hive-site.xml  --num-executors 1 --driver-memory 4g
>> --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g
>> --executor-cores 1 --queue hdmi-express --class
>> com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar
>> startDate=2015-02-16 endDate=2015-02-16
>> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
>> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
>>  Logs
>> ====
>>
>>  Caused by: java.sql.SQLException: No suitable driver found for
>> jdbc:mysql://hostname:3306/HDB
>>  at java.sql.DriverManager.getConnection(DriverManager.java:596)
>>  at java.sql.DriverManager.getConnection(DriverManager.java:187)
>>  at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
>>  at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
>>  ... 68 more
>>  ...
>> ...
>>
>> 15/03/27 23:56:08 INFO yarn.Client: Uploading resource
>> file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
>> hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
>>
>> ...
>>
>> ...
>>
>>
>>
>>  -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
>>     61 Fri Oct 17 08:05:36 GMT-07:00 2014
>> META-INF/services/java.sql.Driver
>>   3396 Fri Oct 17 08:05:22 GMT-07:00 2014
>> com/mysql/fabric/jdbc/FabricMySQLDriver.class
>> *   692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class*
>>   1562 Fri Oct 17 08:05:20 GMT-07:00 2014
>> com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
>>  17817 Fri Oct 17 08:05:20 GMT-07:00 2014
>> com/mysql/jdbc/NonRegisteringDriver.class
>>    690 Fri Oct 17 08:05:24 GMT-07:00 2014
>> com/mysql/jdbc/NonRegisteringReplicationDriver.class
>>    731 Fri Oct 17 08:05:24 GMT-07:00 2014
>> com/mysql/jdbc/ReplicationDriver.class
>>    336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class
>> You have new mail in /var/spool/mail/dvasthimal
>> -sh-4.1$ cat conf/hive-site.xml | grep Driver
>>    <name>javax.jdo.option.ConnectionDriverName</name>
>> *  <value>com.mysql.jdbc.Driver</value>*
>>   <description>Driver class name for a JDBC metastore</description>
>>  -sh-4.1$
>>
>>  --
>>  Deepak
>>
>>
>> On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <michael@databricks.com
>> > wrote:
>>
>>> Are you running on yarn?
>>>
>>>   - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
>>> /etc/hive/conf/ (or the directory where your hive-site.xml is located).
>>>  - If you are running in yarn-cluster mode, the easiest thing to do is
>>> to add--files=/etc/hive/conf/hive-site.xml (or the path for your
>>> hive-site.xml) to your spark-submit script.
>>>
>>> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>> wrote:
>>>
>>>> I can recreate tables but what about data. It looks like this is a
>>>> obvious feature that Spark SQL must be having. People will want to
>>>> transform tons of data stored in HDFS through Hive from Spark SQL.
>>>>
>>>>  Spark programming guide suggests its possible.
>>>>
>>>>
>>>>  Spark SQL also supports reading and writing data stored in Apache Hive
>>>> <http://hive.apache.org/>.  .... Configuration of Hive is done by
>>>> placing your hive-site.xml file in conf/.
>>>>
>>>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>>>
>>>>  For some reason its not working.
>>>>
>>>>
>>>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
>>>> arush@sigmoidanalytics.com> wrote:
>>>>
>>>>>  Seems Spark SQL accesses some more columns apart from those created
>>>>> by hive.
>>>>>
>>>>>  You can always recreate the tables, you would need to execute the
>>>>> table creation scripts but it would be good to avoid recreation.
>>>>>
>>>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I did copy hive-conf.xml form Hive installation into spark-home/conf.
>>>>>> IT does have all the meta store connection details, host, username, passwd,
>>>>>> driver and others.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Snippet
>>>>>> ======
>>>>>>
>>>>>>
>>>>>>  <configuration>
>>>>>>
>>>>>>  <property>
>>>>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>>>>> </property>
>>>>>>
>>>>>>  <property>
>>>>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>>>>   <value>com.mysql.jdbc.Driver</value>
>>>>>>   <description>Driver class name for a JDBC metastore</description>
>>>>>> </property>
>>>>>>
>>>>>>  <property>
>>>>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>>>>   <value>hiveuser</value>
>>>>>>   <description>username to use against metastore
>>>>>> database</description>
>>>>>> </property>
>>>>>>
>>>>>>  <property>
>>>>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>>>>   <value>some-password</value>
>>>>>>   <description>password to use against metastore
>>>>>> database</description>
>>>>>> </property>
>>>>>>
>>>>>>  <property>
>>>>>>   <name>hive.metastore.local</name>
>>>>>>   <value>false</value>
>>>>>>   <description>controls whether to connect to remove metastore server
>>>>>> or open a new metastore server in Hive Client JVM</description>
>>>>>> </property>
>>>>>>
>>>>>>  <property>
>>>>>>   <name>hive.metastore.warehouse.dir</name>
>>>>>>   <value>/user/hive/warehouse</value>
>>>>>>   <description>location of default database for the
>>>>>> warehouse</description>
>>>>>> </property>
>>>>>>
>>>>>>  ......
>>>>>>
>>>>>>
>>>>>>
>>>>>>  When i attempt to read hive table, it does not work. dw_bid does
>>>>>> not exists.
>>>>>>
>>>>>>  I am sure there is a way to read tables stored in HDFS (Hive) from
>>>>>> Spark SQL. Otherwise how would anyone do analytics since the source tables
>>>>>> are always either persisted directly on HDFS or through Hive.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>>
>>>>>>> Since hive and spark SQL internally use HDFS and Hive metastore. The
>>>>>>> only thing you want to change is the processing engine. You can try to
>>>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>>>>> the hive site xml captures the metastore connection details).
>>>>>>>
>>>>>>>  Its a hack,  i havnt tried it. I have played around with the
>>>>>>> metastore and it should work.
>>>>>>>
>>>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> I have few tables that are created in Hive. I wan to transform data
>>>>>>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>>>>
>>>>>>>>  So far i have seen that i can create new tables using Spark SQL
>>>>>>>> dialect. However when i run show tables or do desc hive_table it says table
>>>>>>>> not found.
>>>>>>>>
>>>>>>>>  I am now wondering is this support present or not in Spark SQL ?
>>>>>>>>
>>>>>>>>  --
>>>>>>>>  Deepak
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   --
>>>>>>>
>>>>>>> [image: Sigmoid Analytics]
>>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>>
>>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>>
>>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   --
>>>>>>  Deepak
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>>
>>>>> [image: Sigmoid Analytics]
>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>
>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>
>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>
>>>>
>>>>
>>>>
>>>>   --
>>>>  Deepak
>>>>
>>>>
>>>
>>
>>
>>   --
>>  Deepak
>>
>>
>
>
>  --
>  Deepak
>
>
>


-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by Cheng Lian <li...@gmail.com>.

The "mysql" command line doesn't use JDBC to talk to MySQL server, so 
this doesn't verify anything.

I think this Hive metastore installation guide from Cloudera may be 
helpful. Although this document is for CDH4, the general steps are the 
same, and should help you to figure out the relationships here.

Cheng

On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> I am able to connect to MySQL Hive metastore from the client cluster 
> machine.
>
> -sh-4.1$ mysql --user=hiveuser --password=pass 
> --host=hostname.vip.company.com <http://hostname.vip.company.com>
> Welcome to the MySQL monitor.  Commands end with ; or \g.
> Your MySQL connection id is 9417286
> Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
> Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights 
> reserved.
> Oracle is a registered trademark of Oracle Corporation and/or its
> affiliates. Other names may be trademarks of their respective
> owners.
> Type 'help;' or '\h' for help. Type '\c' to clear the current input 
> statement.
> mysql> use eBayHDB;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
>
> Database changed
> mysql> show tables;
> +---------------------------+
> | Tables_in_HDB         |
>
> +---------------------------+
>
>
> Regards,
> Deepak
>
>
> On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Yes am using yarn-cluster and i did add it via --files. I get
>     "Suitable error not found error"
>
>     Please share the spark-submit command that shows mysql jar
>     containing driver class used to connect to Hive MySQL meta store.
>
>     Even after including it through
>
>      --driver-class-path
>     /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>     OR (AND)
>      --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>
>     I keep getting "Suitable driver not found for"
>
>
>     Command
>     ========
>
>     ./bin/spark-submit -v --master yarn-cluster --driver-class-path
>     */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
>     --jars
>     /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r
>     --files $SPARK_HOME/conf/hive-site.xml  --num-executors 1
>     --driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G"
>     --executor-memory 2g --executor-cores 1 --queue hdmi-express
>     --class com.ebay.ep.poc.spark.reporting.SparkApp
>     spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16
>     endDate=2015-02-16
>     input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
>     subcommand=successevents2
>     output=/user/dvasthimal/epdatasets/successdetail2
>
>     Logs
>     ====
>
>     Caused by: java.sql.SQLException: No suitable driver found for
>     jdbc:mysql://hostname:3306/HDB
>     at java.sql.DriverManager.getConnection(DriverManager.java:596)
>     at java.sql.DriverManager.getConnection(DriverManager.java:187)
>     at
>     com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
>     at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
>     ... 68 more
>     ...
>     ...
>
>     15/03/27 23:56:08 INFO yarn.Client: Uploading resource
>     file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
>     hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
>
>     ...
>
>     ...
>
>
>
>
>     -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
>         61 Fri Oct 17 08:05:36 GMT-07:00 2014
>     META-INF/services/java.sql.Driver
>       3396 Fri Oct 17 08:05:22 GMT-07:00 2014
>     com/mysql/fabric/jdbc/FabricMySQLDriver.class
>     *   692 Fri Oct 17 08:05:22 GMT-07:00 2014
>     com/mysql/jdbc/Driver.class*
>       1562 Fri Oct 17 08:05:20 GMT-07:00 2014
>     com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
>      17817 Fri Oct 17 08:05:20 GMT-07:00 2014
>     com/mysql/jdbc/NonRegisteringDriver.class
>        690 Fri Oct 17 08:05:24 GMT-07:00 2014
>     com/mysql/jdbc/NonRegisteringReplicationDriver.class
>        731 Fri Oct 17 08:05:24 GMT-07:00 2014
>     com/mysql/jdbc/ReplicationDriver.class
>        336 Fri Oct 17 08:05:24 GMT-07:00 2014
>     org/gjt/mm/mysql/Driver.class
>     You have new mail in /var/spool/mail/dvasthimal
>     -sh-4.1$ cat conf/hive-site.xml | grep Driver
>     <name>javax.jdo.option.ConnectionDriverName</name>
>     *<value>com.mysql.jdbc.Driver</value>*
>       <description>Driver class name for a JDBC metastore</description>
>     -sh-4.1$
>
>     -- 
>     Deepak
>
>
>     On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust
>     <michael@databricks.com <ma...@databricks.com>> wrote:
>
>         Are you running on yarn?
>
>          - If you are running in yarn-client mode, set HADOOP_CONF_DIR
>         to /etc/hive/conf/ (or the directory where your hive-site.xml
>         is located).
>          - If you are running in yarn-cluster mode, the easiest thing
>         to do is to add--files=/etc/hive/conf/hive-site.xml (or the
>         path for your hive-site.xml) to your spark-submit script.
>
>         On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
>         <deepujain@gmail.com <ma...@gmail.com>> wrote:
>
>             I can recreate tables but what about data. It looks like
>             this is a obvious feature that Spark SQL must be having.
>             People will want to transform tons of data stored in HDFS
>             through Hive from Spark SQL.
>
>             Spark programming guide suggests its possible.
>
>
>             Spark SQL also supports reading and writing data stored in
>             Apache Hive <http://hive.apache.org/>. .... Configuration
>             of Hive is done by placing your |hive-site.xml| file in
>             |conf/|.
>             https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>
>             For some reason its not working.
>
>
>             On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda
>             <arush@sigmoidanalytics.com
>             <ma...@sigmoidanalytics.com>> wrote:
>
>                 Seems Spark SQL accesses some more columns apart from
>                 those created by hive.
>
>                 You can always recreate the tables, you would need to
>                 execute the table creation scripts but it would be
>                 good to avoid recreation.
>
>                 On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
>                 <deepujain@gmail.com <ma...@gmail.com>> wrote:
>
>                     I did copy hive-conf.xml form Hive installation
>                     into spark-home/conf. IT does have all the meta
>                     store connection details, host, username, passwd,
>                     driver and others.
>
>
>
>                     Snippet
>                     ======
>
>
>                     <configuration>
>
>                     <property>
>                     <name>javax.jdo.option.ConnectionURL</name>
>                     <value>jdbc:mysql://host.vip.company.com:3306/HDB
>                     <http://host.vip.company.com:3306/HDB></value>
>                     </property>
>
>                     <property>
>                     <name>javax.jdo.option.ConnectionDriverName</name>
>                     <value>com.mysql.jdbc.Driver</value>
>                     <description>Driver class name for a JDBC
>                     metastore</description>
>                     </property>
>
>                     <property>
>                     <name>javax.jdo.option.ConnectionUserName</name>
>                     <value>hiveuser</value>
>                     <description>username to use against metastore
>                     database</description>
>                     </property>
>
>                     <property>
>                     <name>javax.jdo.option.ConnectionPassword</name>
>                     <value>some-password</value>
>                     <description>password to use against metastore
>                     database</description>
>                     </property>
>
>                     <property>
>                     <name>hive.metastore.local</name>
>                     <value>false</value>
>                     <description>controls whether to connect to remove
>                     metastore server or open a new metastore server in
>                     Hive Client JVM</description>
>                     </property>
>
>                     <property>
>                     <name>hive.metastore.warehouse.dir</name>
>                     <value>/user/hive/warehouse</value>
>                     <description>location of default database for the
>                     warehouse</description>
>                     </property>
>
>                     ......
>
>
>
>                     When i attempt to read hive table, it does not
>                     work. dw_bid does not exists.
>
>                     I am sure there is a way to read tables stored in
>                     HDFS (Hive) from Spark SQL. Otherwise how would
>                     anyone do analytics since the source tables are
>                     always either persisted directly on HDFS or
>                     through Hive.
>
>
>                     On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda
>                     <arush@sigmoidanalytics.com
>                     <ma...@sigmoidanalytics.com>> wrote:
>
>                         Since hive and spark SQL internally use HDFS
>                         and Hive metastore. The only thing you want to
>                         change is the processing engine. You can try
>                         to bring your hive-site.xml to
>                         %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>                         the hive site xml captures the metastore
>                         connection details).
>
>                         Its a hack,  i havnt tried it. I have played
>                         around with the metastore and it should work.
>
>                         On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
>                         <deepujain@gmail.com
>                         <ma...@gmail.com>> wrote:
>
>                             I have few tables that are created in
>                             Hive. I wan to transform data stored in
>                             these Hive tables using Spark SQL. Is this
>                             even possible ?
>
>                             So far i have seen that i can create new
>                             tables using Spark SQL dialect. However
>                             when i run show tables or do desc
>                             hive_table it says table not found.
>
>                             I am now wondering is this support present
>                             or not in Spark SQL ?
>
>                             -- 
>                             Deepak
>
>
>
>
>                         -- 
>
>                         Sigmoid Analytics
>                         <http://htmlsig.com/www.sigmoidanalytics.com>
>
>                         *Arush Kharbanda* || Technical Teamlead
>
>                         arush@sigmoidanalytics.com
>                         <ma...@sigmoidanalytics.com> ||
>                         www.sigmoidanalytics.com
>                         <http://www.sigmoidanalytics.com/>
>
>
>
>
>                     -- 
>                     Deepak
>
>
>
>
>                 -- 
>
>                 Sigmoid Analytics
>                 <http://htmlsig.com/www.sigmoidanalytics.com>
>
>                 *Arush Kharbanda* || Technical Teamlead
>
>                 arush@sigmoidanalytics.com
>                 <ma...@sigmoidanalytics.com> ||
>                 www.sigmoidanalytics.com
>                 <http://www.sigmoidanalytics.com/>
>
>
>
>
>             -- 
>             Deepak
>
>
>
>
>
>     -- 
>     Deepak
>
>
>
>
> -- 
> Deepak
>

Re: Can spark sql read existing tables created in hive

Posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com.

I am able to connect to MySQL Hive metastore from the client cluster
machine.

-sh-4.1$ mysql --user=hiveuser --password=pass --host=
hostname.vip.company.com
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9417286
Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input
statement.
mysql> use eBayHDB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------------+
| Tables_in_HDB         |

+---------------------------+


Regards,
Deepak


On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com> wrote:

> Yes am using yarn-cluster and i did add it via --files. I get "Suitable
> error not found error"
>
> Please share the spark-submit command that shows mysql jar containing
> driver class used to connect to Hive MySQL meta store.
>
> Even after including it through
>
>  --driver-class-path
> /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
> OR (AND)
>  --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>
> I keep getting "Suitable driver not found for"
>
>
> Command
> ========
>
> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
> --jars
> /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,
> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files
> $SPARK_HOME/conf/hive-site.xml  --num-executors 1 --driver-memory 4g
> --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g
> --executor-cores 1 --queue hdmi-express --class
> com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar
> startDate=2015-02-16 endDate=2015-02-16
> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
> Logs
> ====
>
> Caused by: java.sql.SQLException: No suitable driver found for
> jdbc:mysql://hostname:3306/HDB
> at java.sql.DriverManager.getConnection(DriverManager.java:596)
> at java.sql.DriverManager.getConnection(DriverManager.java:187)
> at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
> at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
> ... 68 more
> ...
> ...
>
> 15/03/27 23:56:08 INFO yarn.Client: Uploading resource
> file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
> hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
>
> ...
>
> ...
>
>
>
> -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
>     61 Fri Oct 17 08:05:36 GMT-07:00 2014 META-INF/services/java.sql.Driver
>   3396 Fri Oct 17 08:05:22 GMT-07:00 2014
> com/mysql/fabric/jdbc/FabricMySQLDriver.class
> *   692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class*
>   1562 Fri Oct 17 08:05:20 GMT-07:00 2014
> com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
>  17817 Fri Oct 17 08:05:20 GMT-07:00 2014
> com/mysql/jdbc/NonRegisteringDriver.class
>    690 Fri Oct 17 08:05:24 GMT-07:00 2014
> com/mysql/jdbc/NonRegisteringReplicationDriver.class
>    731 Fri Oct 17 08:05:24 GMT-07:00 2014
> com/mysql/jdbc/ReplicationDriver.class
>    336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class
> You have new mail in /var/spool/mail/dvasthimal
> -sh-4.1$ cat conf/hive-site.xml | grep Driver
>   <name>javax.jdo.option.ConnectionDriverName</name>
> *  <value>com.mysql.jdbc.Driver</value>*
>   <description>Driver class name for a JDBC metastore</description>
> -sh-4.1$
>
> --
> Deepak
>
>
> On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> Are you running on yarn?
>>
>>  - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
>> /etc/hive/conf/ (or the directory where your hive-site.xml is located).
>>  - If you are running in yarn-cluster mode, the easiest thing to do is to
>> add--files=/etc/hive/conf/hive-site.xml (or the path for your
>> hive-site.xml) to your spark-submit script.
>>
>> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>> wrote:
>>
>>> I can recreate tables but what about data. It looks like this is a
>>> obvious feature that Spark SQL must be having. People will want to
>>> transform tons of data stored in HDFS through Hive from Spark SQL.
>>>
>>> Spark programming guide suggests its possible.
>>>
>>>
>>> Spark SQL also supports reading and writing data stored in Apache Hive
>>> <http://hive.apache.org/>.  .... Configuration of Hive is done by
>>> placing your hive-site.xml file in conf/.
>>>
>>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>>
>>> For some reason its not working.
>>>
>>>
>>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
>>> arush@sigmoidanalytics.com> wrote:
>>>
>>>> Seems Spark SQL accesses some more columns apart from those created by
>>>> hive.
>>>>
>>>> You can always recreate the tables, you would need to execute the table
>>>> creation scripts but it would be good to avoid recreation.
>>>>
>>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> I did copy hive-conf.xml form Hive installation into spark-home/conf.
>>>>> IT does have all the meta store connection details, host, username, passwd,
>>>>> driver and others.
>>>>>
>>>>>
>>>>>
>>>>> Snippet
>>>>> ======
>>>>>
>>>>>
>>>>> <configuration>
>>>>>
>>>>> <property>
>>>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>>>   <value>com.mysql.jdbc.Driver</value>
>>>>>   <description>Driver class name for a JDBC metastore</description>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>>>   <value>hiveuser</value>
>>>>>   <description>username to use against metastore database</description>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>>>   <value>some-password</value>
>>>>>   <description>password to use against metastore database</description>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>   <name>hive.metastore.local</name>
>>>>>   <value>false</value>
>>>>>   <description>controls whether to connect to remove metastore server
>>>>> or open a new metastore server in Hive Client JVM</description>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>   <name>hive.metastore.warehouse.dir</name>
>>>>>   <value>/user/hive/warehouse</value>
>>>>>   <description>location of default database for the
>>>>> warehouse</description>
>>>>> </property>
>>>>>
>>>>> ......
>>>>>
>>>>>
>>>>>
>>>>> When i attempt to read hive table, it does not work. dw_bid does not
>>>>> exists.
>>>>>
>>>>> I am sure there is a way to read tables stored in HDFS (Hive) from
>>>>> Spark SQL. Otherwise how would anyone do analytics since the source tables
>>>>> are always either persisted directly on HDFS or through Hive.
>>>>>
>>>>>
>>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>
>>>>>> Since hive and spark SQL internally use HDFS and Hive metastore. The
>>>>>> only thing you want to change is the processing engine. You can try to
>>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>>>> the hive site xml captures the metastore connection details).
>>>>>>
>>>>>> Its a hack,  i havnt tried it. I have played around with the
>>>>>> metastore and it should work.
>>>>>>
>>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I have few tables that are created in Hive. I wan to transform data
>>>>>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>>>
>>>>>>> So far i have seen that i can create new tables using Spark SQL
>>>>>>> dialect. However when i run show tables or do desc hive_table it says table
>>>>>>> not found.
>>>>>>>
>>>>>>> I am now wondering is this support present or not in Spark SQL ?
>>>>>>>
>>>>>>> --
>>>>>>> Deepak
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> [image: Sigmoid Analytics]
>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>
>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>
>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Deepak
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> [image: Sigmoid Analytics]
>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>
>>>> *Arush Kharbanda* || Technical Teamlead
>>>>
>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>
>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>
>
> --
> Deepak
>
>


-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com.

Yes am using yarn-cluster and i did add it via --files. I get "Suitable
error not found error"

Please share the spark-submit command that shows mysql jar containing
driver class used to connect to Hive MySQL meta store.

Even after including it through

 --driver-class-path
/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
OR (AND)
 --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar

I keep getting "Suitable driver not found for"


Command
========

./bin/spark-submit -v --master yarn-cluster --driver-class-path
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
--jars
/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,
*/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files
$SPARK_HOME/conf/hive-site.xml  --num-executors 1 --driver-memory 4g
--driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g
--executor-cores 1 --queue hdmi-express --class
com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar
startDate=2015-02-16 endDate=2015-02-16
input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
Logs
====

Caused by: java.sql.SQLException: No suitable driver found for
jdbc:mysql://hostname:3306/HDB
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
... 68 more
...
...

15/03/27 23:56:08 INFO yarn.Client: Uploading resource
file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar

...

...



-sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
    61 Fri Oct 17 08:05:36 GMT-07:00 2014 META-INF/services/java.sql.Driver
  3396 Fri Oct 17 08:05:22 GMT-07:00 2014
com/mysql/fabric/jdbc/FabricMySQLDriver.class
*   692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class*
  1562 Fri Oct 17 08:05:20 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
 17817 Fri Oct 17 08:05:20 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringDriver.class
   690 Fri Oct 17 08:05:24 GMT-07:00 2014
com/mysql/jdbc/NonRegisteringReplicationDriver.class
   731 Fri Oct 17 08:05:24 GMT-07:00 2014
com/mysql/jdbc/ReplicationDriver.class
   336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class
You have new mail in /var/spool/mail/dvasthimal
-sh-4.1$ cat conf/hive-site.xml | grep Driver
  <name>javax.jdo.option.ConnectionDriverName</name>
*  <value>com.mysql.jdbc.Driver</value>*
  <description>Driver class name for a JDBC metastore</description>
-sh-4.1$

-- 
Deepak


On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> Are you running on yarn?
>
>  - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
> /etc/hive/conf/ (or the directory where your hive-site.xml is located).
>  - If you are running in yarn-cluster mode, the easiest thing to do is to
> add--files=/etc/hive/conf/hive-site.xml (or the path for your
> hive-site.xml) to your spark-submit script.
>
> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
> wrote:
>
>> I can recreate tables but what about data. It looks like this is a
>> obvious feature that Spark SQL must be having. People will want to
>> transform tons of data stored in HDFS through Hive from Spark SQL.
>>
>> Spark programming guide suggests its possible.
>>
>>
>> Spark SQL also supports reading and writing data stored in Apache Hive
>> <http://hive.apache.org/>.  .... Configuration of Hive is done by
>> placing your hive-site.xml file in conf/.
>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>
>> For some reason its not working.
>>
>>
>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
>> arush@sigmoidanalytics.com> wrote:
>>
>>> Seems Spark SQL accesses some more columns apart from those created by
>>> hive.
>>>
>>> You can always recreate the tables, you would need to execute the table
>>> creation scripts but it would be good to avoid recreation.
>>>
>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>> wrote:
>>>
>>>> I did copy hive-conf.xml form Hive installation into spark-home/conf.
>>>> IT does have all the meta store connection details, host, username, passwd,
>>>> driver and others.
>>>>
>>>>
>>>>
>>>> Snippet
>>>> ======
>>>>
>>>>
>>>> <configuration>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>>   <value>com.mysql.jdbc.Driver</value>
>>>>   <description>Driver class name for a JDBC metastore</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>>   <value>hiveuser</value>
>>>>   <description>username to use against metastore database</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>>   <value>some-password</value>
>>>>   <description>password to use against metastore database</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>hive.metastore.local</name>
>>>>   <value>false</value>
>>>>   <description>controls whether to connect to remove metastore server
>>>> or open a new metastore server in Hive Client JVM</description>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>hive.metastore.warehouse.dir</name>
>>>>   <value>/user/hive/warehouse</value>
>>>>   <description>location of default database for the
>>>> warehouse</description>
>>>> </property>
>>>>
>>>> ......
>>>>
>>>>
>>>>
>>>> When i attempt to read hive table, it does not work. dw_bid does not
>>>> exists.
>>>>
>>>> I am sure there is a way to read tables stored in HDFS (Hive) from
>>>> Spark SQL. Otherwise how would anyone do analytics since the source tables
>>>> are always either persisted directly on HDFS or through Hive.
>>>>
>>>>
>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>>> arush@sigmoidanalytics.com> wrote:
>>>>
>>>>> Since hive and spark SQL internally use HDFS and Hive metastore. The
>>>>> only thing you want to change is the processing engine. You can try to
>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>>> the hive site xml captures the metastore connection details).
>>>>>
>>>>> Its a hack,  i havnt tried it. I have played around with the metastore
>>>>> and it should work.
>>>>>
>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have few tables that are created in Hive. I wan to transform data
>>>>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>>
>>>>>> So far i have seen that i can create new tables using Spark SQL
>>>>>> dialect. However when i run show tables or do desc hive_table it says table
>>>>>> not found.
>>>>>>
>>>>>> I am now wondering is this support present or not in Spark SQL ?
>>>>>>
>>>>>> --
>>>>>> Deepak
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> [image: Sigmoid Analytics]
>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>
>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>
>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>>>
>>> *Arush Kharbanda* || Technical Teamlead
>>>
>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>
>>
>>
>>
>> --
>> Deepak
>>
>>
>


-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by Michael Armbrust <mi...@databricks.com>.

Are you running on yarn?

 - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
/etc/hive/conf/ (or the directory where your hive-site.xml is located).
 - If you are running in yarn-cluster mode, the easiest thing to do is to
add--files=/etc/hive/conf/hive-site.xml (or the path for your
hive-site.xml) to your spark-submit script.

On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com> wrote:

> I can recreate tables but what about data. It looks like this is a obvious
> feature that Spark SQL must be having. People will want to transform tons
> of data stored in HDFS through Hive from Spark SQL.
>
> Spark programming guide suggests its possible.
>
>
> Spark SQL also supports reading and writing data stored in Apache Hive
> <http://hive.apache.org/>.  .... Configuration of Hive is done by placing
> your hive-site.xml file in conf/.
> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>
> For some reason its not working.
>
>
> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
> arush@sigmoidanalytics.com> wrote:
>
>> Seems Spark SQL accesses some more columns apart from those created by
>> hive.
>>
>> You can always recreate the tables, you would need to execute the table
>> creation scripts but it would be good to avoid recreation.
>>
>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>> wrote:
>>
>>> I did copy hive-conf.xml form Hive installation into spark-home/conf. IT
>>> does have all the meta store connection details, host, username, passwd,
>>> driver and others.
>>>
>>>
>>>
>>> Snippet
>>> ======
>>>
>>>
>>> <configuration>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>> </property>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>   <value>com.mysql.jdbc.Driver</value>
>>>   <description>Driver class name for a JDBC metastore</description>
>>> </property>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>   <value>hiveuser</value>
>>>   <description>username to use against metastore database</description>
>>> </property>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>   <value>some-password</value>
>>>   <description>password to use against metastore database</description>
>>> </property>
>>>
>>> <property>
>>>   <name>hive.metastore.local</name>
>>>   <value>false</value>
>>>   <description>controls whether to connect to remove metastore server or
>>> open a new metastore server in Hive Client JVM</description>
>>> </property>
>>>
>>> <property>
>>>   <name>hive.metastore.warehouse.dir</name>
>>>   <value>/user/hive/warehouse</value>
>>>   <description>location of default database for the
>>> warehouse</description>
>>> </property>
>>>
>>> ......
>>>
>>>
>>>
>>> When i attempt to read hive table, it does not work. dw_bid does not
>>> exists.
>>>
>>> I am sure there is a way to read tables stored in HDFS (Hive) from Spark
>>> SQL. Otherwise how would anyone do analytics since the source tables are
>>> always either persisted directly on HDFS or through Hive.
>>>
>>>
>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>> arush@sigmoidanalytics.com> wrote:
>>>
>>>> Since hive and spark SQL internally use HDFS and Hive metastore. The
>>>> only thing you want to change is the processing engine. You can try to
>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>> the hive site xml captures the metastore connection details).
>>>>
>>>> Its a hack,  i havnt tried it. I have played around with the metastore
>>>> and it should work.
>>>>
>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have few tables that are created in Hive. I wan to transform data
>>>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>
>>>>> So far i have seen that i can create new tables using Spark SQL
>>>>> dialect. However when i run show tables or do desc hive_table it says table
>>>>> not found.
>>>>>
>>>>> I am now wondering is this support present or not in Spark SQL ?
>>>>>
>>>>> --
>>>>> Deepak
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> [image: Sigmoid Analytics]
>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>
>>>> *Arush Kharbanda* || Technical Teamlead
>>>>
>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>
>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>>
>> --
>>
>> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>>
>> *Arush Kharbanda* || Technical Teamlead
>>
>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>
>
>
>
> --
> Deepak
>
>

Re: Can spark sql read existing tables created in hive

Posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com.

I can recreate tables but what about data. It looks like this is a obvious
feature that Spark SQL must be having. People will want to transform tons
of data stored in HDFS through Hive from Spark SQL.

Spark programming guide suggests its possible.


Spark SQL also supports reading and writing data stored in Apache Hive
<http://hive.apache.org/>.  .... Configuration of Hive is done by placing
your hive-site.xml file in conf/.
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables

For some reason its not working.


On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <arush@sigmoidanalytics.com
> wrote:

> Seems Spark SQL accesses some more columns apart from those created by
> hive.
>
> You can always recreate the tables, you would need to execute the table
> creation scripts but it would be good to avoid recreation.
>
> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
> wrote:
>
>> I did copy hive-conf.xml form Hive installation into spark-home/conf. IT
>> does have all the meta store connection details, host, username, passwd,
>> driver and others.
>>
>>
>>
>> Snippet
>> ======
>>
>>
>> <configuration>
>>
>> <property>
>>   <name>javax.jdo.option.ConnectionURL</name>
>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>> </property>
>>
>> <property>
>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>   <value>com.mysql.jdbc.Driver</value>
>>   <description>Driver class name for a JDBC metastore</description>
>> </property>
>>
>> <property>
>>   <name>javax.jdo.option.ConnectionUserName</name>
>>   <value>hiveuser</value>
>>   <description>username to use against metastore database</description>
>> </property>
>>
>> <property>
>>   <name>javax.jdo.option.ConnectionPassword</name>
>>   <value>some-password</value>
>>   <description>password to use against metastore database</description>
>> </property>
>>
>> <property>
>>   <name>hive.metastore.local</name>
>>   <value>false</value>
>>   <description>controls whether to connect to remove metastore server or
>> open a new metastore server in Hive Client JVM</description>
>> </property>
>>
>> <property>
>>   <name>hive.metastore.warehouse.dir</name>
>>   <value>/user/hive/warehouse</value>
>>   <description>location of default database for the
>> warehouse</description>
>> </property>
>>
>> ......
>>
>>
>>
>> When i attempt to read hive table, it does not work. dw_bid does not
>> exists.
>>
>> I am sure there is a way to read tables stored in HDFS (Hive) from Spark
>> SQL. Otherwise how would anyone do analytics since the source tables are
>> always either persisted directly on HDFS or through Hive.
>>
>>
>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>> arush@sigmoidanalytics.com> wrote:
>>
>>> Since hive and spark SQL internally use HDFS and Hive metastore. The
>>> only thing you want to change is the processing engine. You can try to
>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>> the hive site xml captures the metastore connection details).
>>>
>>> Its a hack,  i havnt tried it. I have played around with the metastore
>>> and it should work.
>>>
>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>>> wrote:
>>>
>>>> I have few tables that are created in Hive. I wan to transform data
>>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>
>>>> So far i have seen that i can create new tables using Spark SQL
>>>> dialect. However when i run show tables or do desc hive_table it says table
>>>> not found.
>>>>
>>>> I am now wondering is this support present or not in Spark SQL ?
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>>>
>>> *Arush Kharbanda* || Technical Teamlead
>>>
>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>
>>
>>
>>
>> --
>> Deepak
>>
>>
>
>
> --
>
> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>
> *Arush Kharbanda* || Technical Teamlead
>
> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>



-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by Arush Kharbanda <ar...@sigmoidanalytics.com>.

Seems Spark SQL accesses some more columns apart from those created by hive.

You can always recreate the tables, you would need to execute the table
creation scripts but it would be good to avoid recreation.

On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com> wrote:

> I did copy hive-conf.xml form Hive installation into spark-home/conf. IT
> does have all the meta store connection details, host, username, passwd,
> driver and others.
>
>
>
> Snippet
> ======
>
>
> <configuration>
>
> <property>
>   <name>javax.jdo.option.ConnectionURL</name>
>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
> </property>
>
> <property>
>   <name>javax.jdo.option.ConnectionDriverName</name>
>   <value>com.mysql.jdbc.Driver</value>
>   <description>Driver class name for a JDBC metastore</description>
> </property>
>
> <property>
>   <name>javax.jdo.option.ConnectionUserName</name>
>   <value>hiveuser</value>
>   <description>username to use against metastore database</description>
> </property>
>
> <property>
>   <name>javax.jdo.option.ConnectionPassword</name>
>   <value>some-password</value>
>   <description>password to use against metastore database</description>
> </property>
>
> <property>
>   <name>hive.metastore.local</name>
>   <value>false</value>
>   <description>controls whether to connect to remove metastore server or
> open a new metastore server in Hive Client JVM</description>
> </property>
>
> <property>
>   <name>hive.metastore.warehouse.dir</name>
>   <value>/user/hive/warehouse</value>
>   <description>location of default database for the warehouse</description>
> </property>
>
> ......
>
>
>
> When i attempt to read hive table, it does not work. dw_bid does not
> exists.
>
> I am sure there is a way to read tables stored in HDFS (Hive) from Spark
> SQL. Otherwise how would anyone do analytics since the source tables are
> always either persisted directly on HDFS or through Hive.
>
>
> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
> arush@sigmoidanalytics.com> wrote:
>
>> Since hive and spark SQL internally use HDFS and Hive metastore. The only
>> thing you want to change is the processing engine. You can try to bring
>> your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that the hive
>> site xml captures the metastore connection details).
>>
>> Its a hack,  i havnt tried it. I have played around with the metastore
>> and it should work.
>>
>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
>> wrote:
>>
>>> I have few tables that are created in Hive. I wan to transform data
>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>
>>> So far i have seen that i can create new tables using Spark SQL dialect.
>>> However when i run show tables or do desc hive_table it says table not
>>> found.
>>>
>>> I am now wondering is this support present or not in Spark SQL ?
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>>
>> --
>>
>> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>>
>> *Arush Kharbanda* || Technical Teamlead
>>
>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>
>
>
>
> --
> Deepak
>
>


-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

arush@sigmoidanalytics.com || www.sigmoidanalytics.com

Re: Can spark sql read existing tables created in hive

Posted by ๏̯͡๏ <ÐΞ€ρ@Ҝ>, de...@gmail.com.

I did copy hive-conf.xml form Hive installation into spark-home/conf. IT
does have all the meta store connection details, host, username, passwd,
driver and others.



Snippet
======


<configuration>

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hiveuser</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>some-password</value>
  <description>password to use against metastore database</description>
</property>

<property>
  <name>hive.metastore.local</name>
  <value>false</value>
  <description>controls whether to connect to remove metastore server or
open a new metastore server in Hive Client JVM</description>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>

......



When i attempt to read hive table, it does not work. dw_bid does not exists.

I am sure there is a way to read tables stored in HDFS (Hive) from Spark
SQL. Otherwise how would anyone do analytics since the source tables are
always either persisted directly on HDFS or through Hive.


On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <arush@sigmoidanalytics.com
> wrote:

> Since hive and spark SQL internally use HDFS and Hive metastore. The only
> thing you want to change is the processing engine. You can try to bring
> your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that the hive
> site xml captures the metastore connection details).
>
> Its a hack,  i havnt tried it. I have played around with the metastore and
> it should work.
>
> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com>
> wrote:
>
>> I have few tables that are created in Hive. I wan to transform data
>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>
>> So far i have seen that i can create new tables using Spark SQL dialect.
>> However when i run show tables or do desc hive_table it says table not
>> found.
>>
>> I am now wondering is this support present or not in Spark SQL ?
>>
>> --
>> Deepak
>>
>>
>
>
> --
>
> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>
> *Arush Kharbanda* || Technical Teamlead
>
> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>



-- 
Deepak

Re: Can spark sql read existing tables created in hive

Posted by Arush Kharbanda <ar...@sigmoidanalytics.com>.

Since hive and spark SQL internally use HDFS and Hive metastore. The only
thing you want to change is the processing engine. You can try to bring
your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that the hive
site xml captures the metastore connection details).

Its a hack,  i havnt tried it. I have played around with the metastore and
it should work.

On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <de...@gmail.com> wrote:

> I have few tables that are created in Hive. I wan to transform data stored
> in these Hive tables using Spark SQL. Is this even possible ?
>
> So far i have seen that i can create new tables using Spark SQL dialect.
> However when i run show tables or do desc hive_table it says table not
> found.
>
> I am now wondering is this support present or not in Spark SQL ?
>
> --
> Deepak
>
>

-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

arush@sigmoidanalytics.com || www.sigmoidanalytics.com