You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Chris Teoh <ch...@gmail.com> on 2016/12/19 06:02:27 UTC

TBLPROPERTIES appears to be ignored by custom inputformats

Hi there,

Can anyone confirm whether TBLPROPERTIES in DDLs are ignored by custom
inputformats in the context of a UDAF?

I've written a custom input format and it works with a SELECT * but when I
do anything more like SELECT count(*) it returns 0.

INSERT INTO <table> SELECT * FROM <my custom inputformat table> doesn't
appear to insert anything into the table.

I found I had to use the "set" commands to make things work rather than use
DDL. This doesn't appear to make sense if the setting is specific to the
table and would cause problems if I was working with more than one table
and needed different values for the same setting.

Kind regards
Chris

Re: TBLPROPERTIES appears to be ignored by custom inputformats

Posted by Gopal Vijayaraghavan <go...@apache.org>.
>  I don't see anything in the HiveServer2 log when I do SELECT
>   count(*). 

The count(*) should not be running on the HiveServer2, unlike the SELECT *.

SELECT * takes a short-cut by which it skips the entire "cluster" part of Hive and runs the InputFormat locally in HiveServer2.

You might be mistaking that for actual execution of a query (like a count, which is a group-by or further ahead, joins etc). You can possibly do

set hive.fetch.task.conversion=none;

To force "select *" to skip the fast path & follow the same codepath as an actual query.

Also a possibility, is that select count(*) gets rewritten to a select count(1) in the optimizer, which might be interfering with your SerDe?

The logs for the InputFormat should appear in the distributed execution side, which is not the same process as HiveServer2.

How you get those logs is somewhat tied to which Execution Engine is enabled.

Cheers,
Gopal




Re: TBLPROPERTIES appears to be ignored by custom inputformats

Posted by Chris Teoh <ch...@gmail.com>.
Thanks Jorn.

I do see the inputformat logging the settings in HiveServer2 logs when I do
SELECT * . I don't see anything in the HiveServer2 log when I do SELECT
count(*). I assume it must be somewhere in the lower levels or because the
count function does some other thing. I also tried writing my own UDAF
count function and it also did not show the inputformat logs. I'm not using
any custom serde, all I want is the multi line data to be returned as one
row but in a configurable manner.

Kind Regards
Chris

On Mon, 19 Dec 2016 at 19:06 Jörn Franke <jo...@gmail.com> wrote:

> I assume it is related to lazy evaluation by the serde but this would
> require investigation of the source code and log files.
> If there is no exception in the log files then you forgot to log them
> (bad!). Use sonarqube or similar to check your source code related to those
> kind of mistakes - saves debug and fixing time.
> At the same time, your inputformat should have either default values for
> configuration or where not possible throw an exception if expected
> configuration is  it there.
>
> > On 19 Dec 2016, at 07:53, Chris Teoh <ch...@gmail.com> wrote:
> >
> > Thanks Jorn.
> >
> > I don't understand how my select * is correctly reading my table property
> > then if I'm just using default serde.
> >> On Mon., 19 Dec. 2016 at 5:36 pm, Jörn Franke <jo...@gmail.com>
> wrote:
> >>
> >> You have to write a custom hiveserde format to pass tblproperties as
> >> inputformat properties, but check the source code of the serde you used.
> >>
> >>> On 19 Dec 2016, at 07:22, Chris Teoh <ch...@gmail.com> wrote:
> >>>
> >>> rows.
> >>
>

Re: TBLPROPERTIES appears to be ignored by custom inputformats

Posted by Jörn Franke <jo...@gmail.com>.
I assume it is related to lazy evaluation by the serde but this would require investigation of the source code and log files. 
If there is no exception in the log files then you forgot to log them (bad!). Use sonarqube or similar to check your source code related to those kind of mistakes - saves debug and fixing time.
At the same time, your inputformat should have either default values for configuration or where not possible throw an exception if expected configuration is  it there.

> On 19 Dec 2016, at 07:53, Chris Teoh <ch...@gmail.com> wrote:
> 
> Thanks Jorn.
> 
> I don't understand how my select * is correctly reading my table property
> then if I'm just using default serde.
>> On Mon., 19 Dec. 2016 at 5:36 pm, Jörn Franke <jo...@gmail.com> wrote:
>> 
>> You have to write a custom hiveserde format to pass tblproperties as
>> inputformat properties, but check the source code of the serde you used.
>> 
>>> On 19 Dec 2016, at 07:22, Chris Teoh <ch...@gmail.com> wrote:
>>> 
>>> rows.
>> 

Re: TBLPROPERTIES appears to be ignored by custom inputformats

Posted by Chris Teoh <ch...@gmail.com>.
Thanks Jorn.

I don't understand how my select * is correctly reading my table property
then if I'm just using default serde.
On Mon., 19 Dec. 2016 at 5:36 pm, Jörn Franke <jo...@gmail.com> wrote:

> You have to write a custom hiveserde format to pass tblproperties as
> inputformat properties, but check the source code of the serde you used.
>
> > On 19 Dec 2016, at 07:22, Chris Teoh <ch...@gmail.com> wrote:
> >
> > rows.
>

Re: TBLPROPERTIES appears to be ignored by custom inputformats

Posted by Jörn Franke <jo...@gmail.com>.
You have to write a custom hiveserde format to pass tblproperties as inputformat properties, but check the source code of the serde you used.

> On 19 Dec 2016, at 07:22, Chris Teoh <ch...@gmail.com> wrote:
> 
> rows.

Re: TBLPROPERTIES appears to be ignored by custom inputformats

Posted by Chris Teoh <ch...@gmail.com>.
Hi Jorn,

Create external table 'mytable' ( mydata string)
row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
stored as inputformat 'com.hello.world.myinputformat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location '/tmp/mysampledata'
TBLPROPERTIES (
'myinputformat.setting1'='test'
);

I'm not using a HiveSerDe at the moment, I'm just looking to parse a
multiline log file into logical records before applying SerDe.

No exceptions in the log.

SELECT * returns correctly and prints out logging statements in the
HiveServer2 log, SELECT count (*) does not appear to trigger any of my
logging statements to fire into the HiveServer2 log so I don't see
anything. Of course, it's not useful as INSERT INTO SELECT from mytable
doesn't insert any rows.

My theory is that the JobConf object is not populated from the
TBLPROPERTIES when a SELECT count(*) is called. If I use a "set" command,
SELECT count(*) finds that correctly and my inputformat works as expected.
If this is the case, the limitation is when I'm working with more than one
table using the same input format but different value for the
"myinputformat.setting1" setting.

Kind Regards
Chris

On Mon, 19 Dec 2016 at 17:08 Jörn Franke <jo...@gmail.com> wrote:

> What is the create table statement? Do you parse the tblproperties in the
> HiveSerde? Do you have exceptions in the log?
>
> > On 19 Dec 2016, at 07:02, Chris Teoh <ch...@gmail.com> wrote:
> >
> > Hi there,
> >
> > Can anyone confirm whether TBLPROPERTIES in DDLs are ignored by custom
> > inputformats in the context of a UDAF?
> >
> > I've written a custom input format and it works with a SELECT * but when
> I
> > do anything more like SELECT count(*) it returns 0.
> >
> > INSERT INTO <table> SELECT * FROM <my custom inputformat table> doesn't
> > appear to insert anything into the table.
> >
> > I found I had to use the "set" commands to make things work rather than
> use
> > DDL. This doesn't appear to make sense if the setting is specific to the
> > table and would cause problems if I was working with more than one table
> > and needed different values for the same setting.
> >
> > Kind regards
> > Chris
>

Re: TBLPROPERTIES appears to be ignored by custom inputformats

Posted by Jörn Franke <jo...@gmail.com>.
What is the create table statement? Do you parse the tblproperties in the HiveSerde? Do you have exceptions in the log?

> On 19 Dec 2016, at 07:02, Chris Teoh <ch...@gmail.com> wrote:
> 
> Hi there,
> 
> Can anyone confirm whether TBLPROPERTIES in DDLs are ignored by custom
> inputformats in the context of a UDAF?
> 
> I've written a custom input format and it works with a SELECT * but when I
> do anything more like SELECT count(*) it returns 0.
> 
> INSERT INTO <table> SELECT * FROM <my custom inputformat table> doesn't
> appear to insert anything into the table.
> 
> I found I had to use the "set" commands to make things work rather than use
> DDL. This doesn't appear to make sense if the setting is specific to the
> table and would cause problems if I was working with more than one table
> and needed different values for the same setting.
> 
> Kind regards
> Chris