You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by BD <bi...@gmail.com> on 2017/09/24 01:03:01 UTC

Hive - Avro - Schema Manipulation

Hi ,

I have imported (using sqoop) departments table from retail_db in hdfs as
avro file. Have created an external table stored as hive and used the avro
schema generated by sqoop.

I want to modify the avro schema so that a column is non nullable and if
not specified in insert query then a default value is inserted into the
table. Have tried modifying the avro schema as following, but it does not
help.

Avro Schema

{
  "type" : "record",
  "name" : "departments",
  "doc" : "Sqoop import of departments",
  "fields" : [ {
    "name" : "department_id",
    "type" :  "int" ,
    "columnName" : "department_id",
    "sqlType" : "4"
  }, {
    "name" : "department_name",
    "type" : "string",
    "default" : "default_dep_name" ,
    "columnName" : "department_name",
    "sqlType" : "12"
  } ],
  "tableName" : "departments"
}


If i do not provide the value for department name then hive gives error
stating that two columns expected. Is this a valid use case? if so any
suggestion?


hive> insert into dep_av values(8);
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into
target table because column number/types are different 'dep_av': Table
insclause-0 has 2 columns, but query has 1 columns.


regards

Re: Hive - Avro - Schema Manipulation

Posted by BD <bi...@gmail.com>.
Thanks.

Tried that, it inserts a new row with value as null rather than taking
default value of default_dep_name as specified in avro schema.

Error is Avro could not validate record against schema.



On Sep 24, 2017 2:09 AM, "Jörn Franke" <jo...@gmail.com> wrote:

insert into dep_av values(8,null) should do what you intent.

On 24. Sep 2017, at 03:03, BD <bi...@gmail.com> wrote:

Hi ,

I have imported (using sqoop) departments table from retail_db in hdfs as
avro file. Have created an external table stored as hive and used the avro
schema generated by sqoop.

I want to modify the avro schema so that a column is non nullable and if
not specified in insert query then a default value is inserted into the
table. Have tried modifying the avro schema as following, but it does not
help.

Avro Schema

{
  "type" : "record",
  "name" : "departments",
  "doc" : "Sqoop import of departments",
  "fields" : [ {
    "name" : "department_id",
    "type" :  "int" ,
    "columnName" : "department_id",
    "sqlType" : "4"
  }, {
    "name" : "department_name",
    "type" : "string",
    "default" : "default_dep_name" ,
    "columnName" : "department_name",
    "sqlType" : "12"
  } ],
  "tableName" : "departments"
}


If i do not provide the value for department name then hive gives error
stating that two columns expected. Is this a valid use case? if so any
suggestion?


hive> insert into dep_av values(8);
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into
target table because column number/types are different 'dep_av': Table
insclause-0 has 2 columns, but query has 1 columns.


regards

Re: Hive - Avro - Schema Manipulation

Posted by Jörn Franke <jo...@gmail.com>.
insert into dep_av values(8,null) should do what you intent.

> On 24. Sep 2017, at 03:03, BD <bi...@gmail.com> wrote:
> 
> Hi ,
> 
> I have imported (using sqoop) departments table from retail_db in hdfs as avro file. Have created an external table stored as hive and used the avro schema generated by sqoop. 
> 
> I want to modify the avro schema so that a column is non nullable and if not specified in insert query then a default value is inserted into the table. Have tried modifying the avro schema as following, but it does not help. 
> 
> Avro Schema
> 
> {
>   "type" : "record",
>   "name" : "departments",
>   "doc" : "Sqoop import of departments",
>   "fields" : [ {
>     "name" : "department_id",
>     "type" :  "int" ,
>     "columnName" : "department_id",
>     "sqlType" : "4"
>   }, {
>     "name" : "department_name",
>     "type" : "string",
>     "default" : "default_dep_name" ,
>     "columnName" : "department_name",
>     "sqlType" : "12"
>   } ],
>   "tableName" : "departments"
> }
> 
> 
> If i do not provide the value for department name then hive gives error stating that two columns expected. Is this a valid use case? if so any suggestion?
> 
> 
> hive> insert into dep_av values(8);
> FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table because column number/types are different 'dep_av': Table insclause-0 has 2 columns, but query has 1 columns.
> 
> 
> regards