You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gary Liu (Jira)" <ji...@apache.org> on 2020/02/10 14:15:00 UTC
[jira] [Comment Edited] (SPARK-30741) The data returned from SAS using JDBC reader contains column label

    [ https://issues.apache.org/jira/browse/SPARK-30741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033632#comment-17033632 ] 

Gary Liu edited comment on SPARK-30741 at 2/10/20 2:14 PM:
-----------------------------------------------------------

Thanks Hyukjin Kwon!

We tried 2.1.x, 2.3.x and 2.4.x, all got the same results. 

I am a data analyst, not quite sure about detailed settings, but will try my best. I will ask our IT experts to provide additional information if possible. 

We have an in-house SAS Grid server and Spark Clusters. We run pyspark code in Zeppelin. There is a firewall between Zep server and SAS Grid, but we have opened the route/port. 

Based on SAS JDBC Cookbook, there are 2 ways to connect with SAS using JDBC, i.e. SAS/SHARE and IOM. We tested SAS/SHARE. The testing code is as shown below (you can also find in attached screenshot). The returned spark dataframe is all populated with SAS column labels. We reported this to SAS Support, they said the data was correctly returned when using JAVA to call SAS JDBC driver, so they think that is due to Spark JDBC reader. The SAS JDBC driver worked OK with KNIME or desktop JDBC SQL clients, such as Squirrel SQL/DBeaver (although it does return column labels as column name in the results), where we can see correct data returned. But in spark, the results is not usable. We are wondering if this is due to spark JDBC reader or any other reasons, such as firewall setting......

 

Testing code: 

 

 
{code:java}
ja = spark.read.jdbc("jdbc:sharenet://server_path:port?librefs=MyLib '/sas/server/path'"
 ,'''(
      select 
         cust_id as cust label 'customer_ID'
         , 'test' as col label 'label_name'
      from 
         MyLib.test_data(obs=10)
     ) t'''
 , properties={"driver":"com.sas.net.sharenet.ShareNetDriver"})

ja.show(10, False){code}
 

Results: 
{code:java}
+-----------+----------+
|customer_ID|label_name|
+-----------+----------+
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
+-----------+----------+{code}


was (Author: garyliutelus):
Thanks Hyukjin Kwon!

We tried 2.1.x, 2.3.x and 2.4.x, all got the same results. 

I am a data analyst, not quite sure about detailed settings, but will try my best. I will ask our IT experts to provide additional information if possible. 

We have an in-house SAS Grid server and Spark Clusters. We run pyspark code in Zeppelin. There is a firewall between Zep server and SAS Grid, but we have opened the route/port. 

Based on SAS JDBC Cookbook, there are 2 ways to connect with SAS using JDBC, i.e. SAS/SHARE and IOM. We tested SAS/SHARE. The testing code is as shown below (you can also find in attached screenshot). The returned spark dataframe is all populated with SAS column labels. We reported this to SAS Support, they said the data was correctly returned when using JAVA to call SAS JDBC driver, so they think that is due to Spark JDBC reader. The SAS JDBC driver worked OK with KNIME or desktop JDBC SQL clients, such as Squirrel SQL/DBeaver (although it does return column labels as column name in the results), where we can see correct data returned. But in spark, the results is not usable. We are wondering if this is due to spark JDBC reader or any other reasons, such as firewall setting......

 

Testing code: 

 

 
{code:java}
ja = spark.read.jdbc("jdbc:sharenet://server_path:port?librefs=MyLib '/sas/server/path'"
 ,'''(
      select 
         cust_id as cust label 'customer_ID'
         , 'test' as col label 'label_name'
      from 
         MyLib.test_data(obs=10)
     ) t'''
 , properties={"driver":"com.sas.net.sharenet.ShareNetDriver"})

ja.show(10, False){code}
{{}}

 

 

Results: 
{code:java}
+-----------+----------+
|customer_ID|label_name|
+-----------+----------+
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
+-----------+----------+{code}

> The data returned from SAS using JDBC reader contains column label
> ------------------------------------------------------------------
>
>                 Key: SPARK-30741
>                 URL: https://issues.apache.org/jira/browse/SPARK-30741
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, PySpark
>    Affects Versions: 2.1.1
>            Reporter: Gary Liu
>            Priority: Major
>         Attachments: SparkBug.png
>
>
> When read SAS data using JDBC with SAS SHARE driver, the returned data contains column labels, rather data. 
> According to testing result from SAS Support, the results are correct using Java. So they believe it is due to spark reading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org