You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Luke Crouch <lc...@geek.net> on 2010/10/12 18:17:19 UTC

boolean types thru a transform script

I'm trying to pass a FALSE value thru a custom transform script to another
table, like so:

        FROM (
            FROM downloads
            SELECT project, file, os, FALSE as folder, country, dt
            WHERE dt='2010-05-14'
            DISTRIBUTE BY project
            SORT BY project asc, file asc
        ) b
        INSERT OVERWRITE TABLE dl_day PARTITION (dt='2010-05-14', project)
        SELECT TRANSFORM(file, os, country, folder, dt, project) USING
'transformwrap reduce.py  --verbose' AS (file, downloads, os, folder,
country, project)

> describe dl_day
['file', 'string', '']
['downloads', 'int', '']
['os', 'string', '']
['country', 'string', '']
['folder', 'boolean', '']
['dt', 'string', '']
['project', 'string', '']

When I log the 'folder' value from inside reduce.py, it shows:

2010-10-12 15:32:10,914 - dstat - INFO - reduce to stdout, h[folder]:

i.e., an empty string. But when the INSERT executes, it seems to treat the
value as TRUE (or string 'true')?

> select folder from dl_day
['true']
['true']
['true']
['true']
...

How can I preserve the FALSE value thru the transform script?

Thanks,
-L

Re: boolean types thru a transform script

Posted by Dave Brondsema <db...@geek.net>.
Transform scripts only output text, so Hive has to convert from string to
the column's data type (boolean in this case).  So if you send an empty
string "", that will be converted to boolean FALSE.

FYI, on the way in to a transform script, booleans come through as strings
"true" and "false".

On Tue, Oct 12, 2010 at 12:17 PM, Luke Crouch <lc...@geek.net> wrote:

> I'm trying to pass a FALSE value thru a custom transform script to another
> table, like so:
>
>         FROM (
>             FROM downloads
>             SELECT project, file, os, FALSE as folder, country, dt
>             WHERE dt='2010-05-14'
>             DISTRIBUTE BY project
>             SORT BY project asc, file asc
>         ) b
>         INSERT OVERWRITE TABLE dl_day PARTITION (dt='2010-05-14', project)
>         SELECT TRANSFORM(file, os, country, folder, dt, project) USING
> 'transformwrap reduce.py  --verbose' AS (file, downloads, os, folder,
> country, project)
>
> > describe dl_day
> ['file', 'string', '']
> ['downloads', 'int', '']
> ['os', 'string', '']
> ['country', 'string', '']
> ['folder', 'boolean', '']
> ['dt', 'string', '']
> ['project', 'string', '']
>
> When I log the 'folder' value from inside reduce.py, it shows:
>
> 2010-10-12 15:32:10,914 - dstat - INFO - reduce to stdout, h[folder]:
>
> i.e., an empty string. But when the INSERT executes, it seems to treat the
> value as TRUE (or string 'true')?
>
> > select folder from dl_day
> ['true']
> ['true']
> ['true']
> ['true']
> ...
>
> How can I preserve the FALSE value thru the transform script?
>
> Thanks,
> -L
>



-- 
Dave Brondsema
Software Engineer
Geeknet

www.geek.net