You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Jerry Ylilammi (JIRA)" <ji...@apache.org> on 2015/12/10 15:23:11 UTC

[jira] [Resolved] (PARQUET-402) Apache Pig cannot store Map data type into Parquet format

     [ https://issues.apache.org/jira/browse/PARQUET-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerry Ylilammi resolved PARQUET-402.
------------------------------------
    Resolution: Not A Problem

My bad, I didn't realize you had to specify the map type like so:
table_with_map_data = FOREACH random_data GENERATE TOMAP('123', 'hello', '456', 'world') as (my_map:map[chararray]);

I found the issue reading an avro file where the map was properly defined to have string values. Apparently Pig didn't properly convert that schema. By manually enforcing the map type map[charrarray] it works correctly.

> Apache Pig cannot store Map data type into Parquet format
> ---------------------------------------------------------
>
>                 Key: PARQUET-402
>                 URL: https://issues.apache.org/jira/browse/PARQUET-402
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-pig
>    Affects Versions: 1.6.0, 1.8.1
>            Reporter: Jerry Ylilammi
>
> Trying to store simple map with two entries gives me following exception:
> {code}table_with_map_data: {my_map: map[]}
> 2015-12-10 11:58:54,478 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
> 2015-12-10 11:58:54,498 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Invalid map Schema, schema should contain exactly one field: my_map: map{code}
> For example taking any input and doing this gives me the exception:
> {code}table_with_map_data = FOREACH random_data GENERATE TOMAP('123', 'hello', '456', 'world') as (my_map);
> DESCRIBE table_with_map_data;
> STORE table_with_map_data INTO '...' USING ParquetStorer();{code}
> I'm using latest version of Pig: Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
> and Parquet: parquet-pig-bundle-1.6.0.jar
> EDIT: I noticed Parquet 1.8.1 is out. I switched to it and were forced to update the pig script to use full path with ParquetStorer. However this gives me same error as 1.6.0.
> {code}STORE table_with_map_data INTO '/Users/jerry/tmp/parquet/output/parquet' USING org.apache.parquet.pig.ParquetStorer();{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)