You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 喜之郎 <25...@qq.com> on 2016/05/15 16:18:38 UTC

spark udf can not change a json string to a map

Hi, all. I want to implement a udf which is used to change a json string to a map<string,string>.
But some problem occurs. My spark version:1.5.1.




my udf code:
####################
	public Map<String,String> evaluate(final String s) {
		if (s == null)
			return null;
		return getString(s);
	}


	@SuppressWarnings("unchecked")
	public static Map<String,String> getString(String s) {
		try {
			String str =  URLDecoder.decode(s, "UTF-8");
			ObjectMapper mapper = new ObjectMapper();
			Map<String,String>  map = mapper.readValue(str, Map.class);
			
			return map;
		} catch (Exception e) {
			return new HashMap<String,String>();
		}
	}

#############
exception infos:


16/05/14 21:05:22 ERROR CliDriver: org.apache.spark.sql.AnalysisException: Map type in java is unsupported because JVM type erasure makes spark fail to catch key and value types in Map<>; line 1 pos 352
	at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:230)
	at org.apache.spark.sql.hive.HiveSimpleUDF.javaClassToDataType(hiveUDFs.scala:107)
	at org.apache.spark.sql.hive.HiveSimpleUDF.<init>(hiveUDFs.scala:136)

################




I have saw that there is a testsuite in spark says spark did not support this kind of udf.
But is there a method to implement this udf?

回复： spark udf can not change a json string to a map

Posted by 喜之郎 <25...@qq.com>.

this is my usecase:
   Another system upload csv files to my system. In csv files, there are complicated data types such as map. In order to express complicated data types and ordinary string having special characters， we put urlencoded string in csv files.  So we use urlencoded json string to express map,string and array.


second stage:
  load csv files to spark text table. 
###############
CREATE TABLE `a_text`(
  parameters  string
)
load data inpath 'XXX' into table a_text;
#############
Third stage:
 insert into spark parquet table select from text table. In order to use advantage of complicated data types, we use udf to transform a json string to map , and put map to table.


CREATE TABLE `a_parquet`(
  parameters   map<string,string>
)



insert into a_parquet select UDF(parameters ) from a_text;


So do you have any suggestions?












------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2016年5月16日(星期一) 凌晨0:44
收件人: "喜之郎"<25...@qq.com>; 
抄送: "user"<us...@spark.apache.org>; 
主题: Re: spark udf can not change a json string to a map



Can you let us know more about your use case ?

I wonder if you can structure your udf by not returning Map.


Cheers


On Sun, May 15, 2016 at 9:18 AM, 喜之郎 <25...@qq.com> wrote:
Hi, all. I want to implement a udf which is used to change a json string to a map<string,string>.
But some problem occurs. My spark version:1.5.1.




my udf code:
####################
	public Map<String,String> evaluate(final String s) {
		if (s == null)
			return null;
		return getString(s);
	}


	@SuppressWarnings("unchecked")
	public static Map<String,String> getString(String s) {
		try {
			String str =  URLDecoder.decode(s, "UTF-8");
			ObjectMapper mapper = new ObjectMapper();
			Map<String,String>  map = mapper.readValue(str, Map.class);
			
			return map;
		} catch (Exception e) {
			return new HashMap<String,String>();
		}
	}

#############
exception infos:


16/05/14 21:05:22 ERROR CliDriver: org.apache.spark.sql.AnalysisException: Map type in java is unsupported because JVM type erasure makes spark fail to catch key and value types in Map<>; line 1 pos 352
	at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:230)
	at org.apache.spark.sql.hive.HiveSimpleUDF.javaClassToDataType(hiveUDFs.scala:107)
	at org.apache.spark.sql.hive.HiveSimpleUDF.<init>(hiveUDFs.scala:136)

################




I have saw that there is a testsuite in spark says spark did not support this kind of udf.
But is there a method to implement this udf?

Re: spark udf can not change a json string to a map

Posted by Ted Yu <yu...@gmail.com>.

Can you let us know more about your use case ?

I wonder if you can structure your udf by not returning Map.

Cheers

On Sun, May 15, 2016 at 9:18 AM, 喜之郎 <25...@qq.com> wrote:

> Hi, all. I want to implement a udf which is used to change a json string
> to a map<string,string>.
> But some problem occurs. My spark version:1.5.1.
>
>
> my udf code:
> ####################
> public Map<String,String> evaluate(final String s) {
> if (s == null)
> return null;
> return getString(s);
> }
>
> @SuppressWarnings("unchecked")
> public static Map<String,String> getString(String s) {
> try {
> String str =  URLDecoder.decode(s, "UTF-8");
> ObjectMapper mapper = new ObjectMapper();
> Map<String,String>  map = mapper.readValue(str, Map.class);
> return map;
> } catch (Exception e) {
> return new HashMap<String,String>();
> }
> }
> #############
> exception infos:
>
> 16/05/14 21:05:22 ERROR CliDriver: org.apache.spark.sql.AnalysisException:
> Map type in java is unsupported because JVM type erasure makes spark fail
> to catch key and value types in Map<>; line 1 pos 352
> at
> org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:230)
> at
> org.apache.spark.sql.hive.HiveSimpleUDF.javaClassToDataType(hiveUDFs.scala:107)
> at org.apache.spark.sql.hive.HiveSimpleUDF.<init>(hiveUDFs.scala:136)
> ################
>
>
> I have saw that there is a testsuite in spark says spark did not support
> this kind of udf.
> But is there a method to implement this udf?
>
>
>