You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by BD <bi...@gmail.com> on 2017/09/24 01:03:01 UTC
Hive - Avro - Schema Manipulation
Hi ,
I have imported (using sqoop) departments table from retail_db in hdfs as
avro file. Have created an external table stored as hive and used the avro
schema generated by sqoop.
I want to modify the avro schema so that a column is non nullable and if
not specified in insert query then a default value is inserted into the
table. Have tried modifying the avro schema as following, but it does not
help.
Avro Schema
{
"type" : "record",
"name" : "departments",
"doc" : "Sqoop import of departments",
"fields" : [ {
"name" : "department_id",
"type" : "int" ,
"columnName" : "department_id",
"sqlType" : "4"
}, {
"name" : "department_name",
"type" : "string",
"default" : "default_dep_name" ,
"columnName" : "department_name",
"sqlType" : "12"
} ],
"tableName" : "departments"
}
If i do not provide the value for department name then hive gives error
stating that two columns expected. Is this a valid use case? if so any
suggestion?
hive> insert into dep_av values(8);
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into
target table because column number/types are different 'dep_av': Table
insclause-0 has 2 columns, but query has 1 columns.
regards
Re: Hive - Avro - Schema Manipulation
Posted by BD <bi...@gmail.com>.
Thanks.
Tried that, it inserts a new row with value as null rather than taking
default value of default_dep_name as specified in avro schema.
Error is Avro could not validate record against schema.
On Sep 24, 2017 2:09 AM, "Jörn Franke" <jo...@gmail.com> wrote:
insert into dep_av values(8,null) should do what you intent.
On 24. Sep 2017, at 03:03, BD <bi...@gmail.com> wrote:
Hi ,
I have imported (using sqoop) departments table from retail_db in hdfs as
avro file. Have created an external table stored as hive and used the avro
schema generated by sqoop.
I want to modify the avro schema so that a column is non nullable and if
not specified in insert query then a default value is inserted into the
table. Have tried modifying the avro schema as following, but it does not
help.
Avro Schema
{
"type" : "record",
"name" : "departments",
"doc" : "Sqoop import of departments",
"fields" : [ {
"name" : "department_id",
"type" : "int" ,
"columnName" : "department_id",
"sqlType" : "4"
}, {
"name" : "department_name",
"type" : "string",
"default" : "default_dep_name" ,
"columnName" : "department_name",
"sqlType" : "12"
} ],
"tableName" : "departments"
}
If i do not provide the value for department name then hive gives error
stating that two columns expected. Is this a valid use case? if so any
suggestion?
hive> insert into dep_av values(8);
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into
target table because column number/types are different 'dep_av': Table
insclause-0 has 2 columns, but query has 1 columns.
regards
Re: Hive - Avro - Schema Manipulation
Posted by Jörn Franke <jo...@gmail.com>.
insert into dep_av values(8,null) should do what you intent.
> On 24. Sep 2017, at 03:03, BD <bi...@gmail.com> wrote:
>
> Hi ,
>
> I have imported (using sqoop) departments table from retail_db in hdfs as avro file. Have created an external table stored as hive and used the avro schema generated by sqoop.
>
> I want to modify the avro schema so that a column is non nullable and if not specified in insert query then a default value is inserted into the table. Have tried modifying the avro schema as following, but it does not help.
>
> Avro Schema
>
> {
> "type" : "record",
> "name" : "departments",
> "doc" : "Sqoop import of departments",
> "fields" : [ {
> "name" : "department_id",
> "type" : "int" ,
> "columnName" : "department_id",
> "sqlType" : "4"
> }, {
> "name" : "department_name",
> "type" : "string",
> "default" : "default_dep_name" ,
> "columnName" : "department_name",
> "sqlType" : "12"
> } ],
> "tableName" : "departments"
> }
>
>
> If i do not provide the value for department name then hive gives error stating that two columns expected. Is this a valid use case? if so any suggestion?
>
>
> hive> insert into dep_av values(8);
> FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table because column number/types are different 'dep_av': Table insclause-0 has 2 columns, but query has 1 columns.
>
>
> regards