You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "pin_zhang (JIRA)" <ji...@apache.org> on 2015/04/16 10:04:58 UTC

[jira] [Commented] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

    [ https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497701#comment-14497701 ] 

pin_zhang commented on SPARK-6923:
----------------------------------

In spark1.1.0 client with the jdbc api to get the table schema
age(bigint), id(string)
while in spark1.3.0 {name=col, type=array<string>}
That's not expected.

ArrayList<Map> results = new ArrayList();
DatabaseMetaData meta = cnn.getMetaData();			 
rsColumns = meta.getColumns(database, null, table, null);		
while (rsColumns.next()) {
	Map col = new HashMap();
	col.put("name", rsColumns.getString("COLUMN_NAME"));
	String typeName = rsColumns.getString("TYPE_NAME");
	col.put("type", typeName);
	results.add(col);
}
rsColumns.close();


> Get invalid hive table columns after save DataFrame to hive table
> -----------------------------------------------------------------
>
>                 Key: SPARK-6923
>                 URL: https://issues.apache.org/jira/browse/SPARK-6923
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: pin_zhang
>
> HiveContext hctx = new HiveContext(sc);
> List<String> sample = new ArrayList<String>();
> sample.add( "{\"id\": \"id_1\", \"age\":1}" );
> RDD<String> sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();	
> DataFrame df = hctx.jsonRDD(sampleRDD);
> String table="test";
> df.saveAsTable(table, "json",SaveMode.Overwrite);
> Table t = hctx.catalog().client().getTable(table);
> System.out.println( t.getCols());
> --------------------------------------------------------------
> With the code above to save DataFrame to hive table,
> Get table cols returns one column named 'col'
> [FieldSchema(name:col, type:array<string>, comment:from deserializer)]
> Expected return fields schema id, age.
> This results in the jdbc API cannot retrieves the table columns via ResultSet DatabaseMetaData.getColumns(String catalog, String schemaPattern,String tableNamePattern, String columnNamePattern)
> But resultset metadata for query " select * from test "  contains fields id, age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org