You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Farhan Misarwala <fa...@gmail.com> on 2021/05/02 11:46:08 UTC
Re: Spark JDBC errors out

Thanks, Mich,

I have been using the JDBC source with MySQL & Postgres drivers in
production for almost 4 years now. The error looked a bit weird and what I
meant to ask was am I doing it right? As you mentioned, I will also check
with the developers of the driver if they have anything to say about this.
Thanks for looking into it :)

Regards,
Farhan.

On Fri, Apr 30, 2021 at 7:01 PM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi Farhan,
>
> I have used it successfully and it works. The only thing that potentially
> can cause this issue is the jdbc driver itself. Have  you tried another
> jdbc driver like progress direct etc. Most of these defects are related to
> jdbc driver itself!
>
> HHT,
>
> Mich
>
> On Fri, 30 Apr 2021 at 13:49, Farhan Misarwala <fa...@gmail.com>
> wrote:
>
>> Hi Mich,
>>
>> I have tried this already. I am using the same methods you are using in
>> my Java code. I see the same error, where 'dbtable' or 'query' gets added
>> as a connection property in the JDBC connection string for the source db,
>> which is AAS in my case.
>>
>>
>>
>> Thanks,
>>
>> Farhan.
>>
>>
>> On Fri, Apr 30, 2021 at 3:07 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> If you are using Spark JDBC connection then you can do the following
>>> generic JDBC from PySpark say. that tablename could be sql query as well
>>> (select col1, col2 from <table>)
>>>
>>> ## load from database
>>> def loadTableFromJDBC(spark, url, tableName, user, password, driver,
>>> fetchsize):
>>>     try:
>>>        house_df = spark.read. \
>>>             format("jdbc"). \
>>>             option("url", url). \
>>>             option("dbtable", tableName). \
>>>             option("user", user). \
>>>             option("password", password). \
>>>             option("driver", driver). \
>>>             option("fetchsize", fetchsize). \
>>>             load()
>>>        return house_df
>>>     except Exception as e:
>>>         print(f"""{e}, quitting""")
>>>         sys.exit(1)
>>>
>>> ## write to database
>>> def writeTableWithJDBC(dataFrame, url, tableName, user, password,
>>> driver, mode):
>>>     try:
>>>         dataFrame. \
>>>             write. \
>>>             format("jdbc"). \
>>>             option("url", url). \
>>>             option("dbtable", tableName). \
>>>             option("user", user). \
>>>             option("password", password). \
>>>             option("driver", driver). \
>>>             mode(mode). \
>>>             save()
>>>     except Exception as e:
>>>         print(f"""{e}, quitting""")
>>>         sys.exit(1)
>>>
>>> I don't know about AWS, but this will allow you to connect to Google
>>> BigQuery. Check AWS documentation
>>>
>>>
>>> def loadTableFromBQ(spark,dataset,tableName):
>>>     try:
>>>         df = spark.read. \
>>>             format("bigquery"). \
>>>             option("credentialsFile",
>>> config['GCPVariables']['jsonKeyFile']). \
>>>             option("dataset", dataset). \
>>>             option("table", tableName). \
>>>             load()
>>>         return df
>>>     except Exception as e:
>>>         print(f"""{e}, quitting""")
>>>         sys.exit(1)
>>>
>>> def writeTableToBQ(dataFrame,mode,dataset,tableName):
>>>     try:
>>>         dataFrame. \
>>>             write. \
>>>             format("bigquery"). \
>>>             option("credentialsFile",
>>> config['GCPVariables']['jsonKeyFile']). \
>>>             mode(mode). \
>>>             option("dataset", dataset). \
>>>             option("table", tableName). \
>>>             save()
>>>     except Exception as e:
>>>         print(f"""{e}, quitting""")
>>>         sys.exit(1)
>>>
>>> HTH
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 30 Apr 2021 at 09:32, Farhan Misarwala <
>>> farhan.misarwala@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am trying to read data from 'Azure Analysis Services' using CData's
>>>> AAS JDBC driver
>>>> <https://cdn.cdata.com/help/OAF/jdbc/pg_JDBCconnectcode.htm>.
>>>>
>>>> I am doing spark.read().jdbc(jdbcURL, aasQuery, connectionProperties)
>>>>
>>>> This errors out as the driver is throwing an exception saying there's
>>>> no 'dbtable' as a connection property. This means spark is trying to set
>>>> 'dbtable' as a connection property on the driver, instead of creating a
>>>> connection and then executing the query I specify in the options 'dbtable'
>>>> or 'query'.
>>>>
>>>> Usually, in my non-spark/vanilla java code, I use
>>>> java.sql.DriverManager to create a connection and then execute a query
>>>> successfully using execute() from java.sql.Statement.
>>>>
>>>> Does the JDBC data source not follow such a flow? if not then what
>>>> could be the best solution in this case? please advice. It would be great
>>>> if I can query AAS through our beloved Apache Spark ;)
>>>>
>>>> TIA,
>>>> Farhan.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>> --
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>