You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chris Veregge (Jira)" <ji...@apache.org> on 2020/05/19 17:59:00 UTC

[jira] [Created] (HIVE-23511) percentile_approx throws error when using CTAS statement

Chris Veregge created HIVE-23511:
------------------------------------

             Summary: percentile_approx throws error when using CTAS statement
                 Key: HIVE-23511
                 URL: https://issues.apache.org/jira/browse/HIVE-23511
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.1.0
         Environment: [vereggcadmin@ip-10-40-51-103 ~]$ hive --version
Hive 2.1.0-amzn-0
Subversion git://ip-10-169-254-27/workspace/workspace/bigtop.release-rpm-5.2.0/build/hive/rpm/BUILD/apache-hive-2.1.0-amzn-0-src -r 418fa8c602f2a4b153c1a89806305f6b5a27a524
Compiled by ec2-user on Wed Nov 16 03:10:37 UTC 2016
From source with checksum 64a5b18bfaf894a6b2f1cd14a0654e92

            Reporter: Chris Veregge


CTAS statements appear to fail with percentile_approx when using a float array as the second argument.

Here's example code that demonstrates the issue.

This statement works
select
percentile_approx(num,array(0.1,0.5,0.9)) as ptile
from sample;

but adding a CTAS statement to the same query results in an error
create table ptile_table as
select
percentile_approx(num,array(0.1,0.5,0.9)) as ptile
from sample;

FAILED: UDFArgumentTypeException The second argument must be a constant, but array<double> was passed instead.


here's verbose log output including a statment to make the table "sample" which is just a column of float values

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: false
set hive.cli.print.header=true
set hive.resultset.use.unique.column.names=false
set hive.exec.parallel=false
set hive.groupby.orderby.position.alias = true
set mapreduce.job.reduce.slowstart.completedmaps = 0.95
set hive.execution.engine=tez
set hive.tez.auto.reducer.parallelism=true
set hive.default.fileformat=orc
set hive.default.fileformat.managed=orc


create table if not exists sample as
select rand() as num
from ucp.dim_date limit 100
OK
Time taken: 0.99 seconds


select
percentile_approx(num,array(0.1,0.5,0.9)) as ptile
from sample
Query ID = vereggcadmin_20200519172814_e2cabf47-d8e4-45a9-b5c5-87e323ee8668
Total jobs = 1
Launching Job 1 out of 1
Waiting for Tez session and AM to be ready...


Status: Running (Executing on YARN cluster with App id application_1577992969986_117744)

Map 1: 0/1	Reducer 2: 0/1	
Map 1: 0/1	Reducer 2: 0/1	
Map 1: 0(+1)/1	Reducer 2: 0/1	
Map 1: 1/1	Reducer 2: 0/1	
Map 1: 1/1	Reducer 2: 0(+1)/1	
Map 1: 1/1	Reducer 2: 1/1	
OK
ptile
[0.0539687133111435,0.5168283485290134,0.8464088546353761]
Time taken: 14.694 seconds, Fetched: 1 row(s)


create table ptile_table as
select
percentile_approx(num,array(0.1,0.5,0.9)) as ptile
from sample
FAILED: UDFArgumentTypeException The second argument must be a constant, but array<double> was passed instead.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)