You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/05/01 09:19:05 UTC

[jira] [Resolved] (SPARK-7285) Audit missing Hive functions

     [ https://issues.apache.org/jira/browse/SPARK-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin resolved SPARK-7285.
--------------------------------
    Resolution: Fixed

> Audit missing Hive functions
> ----------------------------
>
>                 Key: SPARK-7285
>                 URL: https://issues.apache.org/jira/browse/SPARK-7285
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>
> Create a list of functions that is on this page but not in SQL/DataFrame.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
> Here's the list of missing stuff:
> *basic*
> between
> bitwise operation
> bitwiseAND
> bitwiseOR
> bitwiseXOR
> bitwiseNOT
> *math*
> round(DOUBLE a)
> round(DOUBLE a, INT d) Returns a rounded to d decimal places.
> log2
> sqrt(string column name)
> bin
> hex(long), hex(string), hex(binary)
> unhex(string) -> binary
> conv
> pmod
> factorial
> toDeg  -> toDegrees
> toRad -> toRadians
> e()
> pi()
> shiftleft(int or long)
> shiftright(int or long)
> shiftrightunsigned(int or long)
> *collection functions*
> sort_array(array)
> size(map, array)
> map_values(map<k,v>): array<v>
> map_keys(map<k,v>):array<k>
> array_contains(array<t>, value): boolean
> *date functions*
> from_unixtime(long, string): string
> unix_timestamp(): long
> unix_timestamp(date): long
> year(date): int
> month(date): int
> day(date): int
> dayofmonth(date); int
> hour(timestamp): int
> minute(timestamp): int
> second(timestamp): int
> weekofyear(date): int
> date_add(date, int)
> date_sub(date, int)
> from_utc_timestamp(timestamp, string timezone): timestamp
> current_date(): date
> current_timestamp(): timestamp
> add_months(string start_date, int num_months): string
> last_day(string date): string
> next_day(string start_date, string day_of_week): string
> trunc(string date[, string format]): string
> months_between(date1, date2): double
> date_format(date/timestamp/string ts, string fmt): String
> *conditional functions*
> if(boolean testCondition, T valueTrue, T valueFalseOrNull): T
> nvl(T value, T default_value): T
> greatest(T v1, T v2, …): T
> least(T v1, T v2, …): T
> *string functions*
> ascii(string str): int
> base64(binary): string
> concat(string|binary A, string|binary B…): string | binary
> concat_ws(string SEP, string A, string B…): string
> concat_ws(string SEP, array<string>): string
> decode(binary bin, string charset): string
> encode(string src, string charset): binary
> find_in_set(string str, string strList): int
> format_number(number x, int d): string
> length(string): int
> instr(string str, string substr): int
> locate(string substr, string str[, int pos]): int
> lower(string), lcase(string)
> lpad(string str, int len, string pad): string
> ltrim(string): string
> parse_url(string urlString, string partToExtract [, string keyToExtract]): string
> printf(String format, Obj... args): string
> regexp_extract(string subject, string pattern, int index): string
> regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): string
> repeat(string str, int n): string
> reverse(string A): string
> rpad(string str, int len, string pad): string
> space(int n): string
> split(string str, string pat): array
> str_to_map(text[, delimiter1, delimiter2]): map<string, string>
> trim(string A): string
> unbase64(string str): binary
> upper(string A) ucase(string A): string
> levenshtein(string A, string B: int
> soundex(string A): string
> *Misc*
> hash(a1[, a2…]): int
> *text*
> context_ngrams(array<array<string>>, array<string>, int K, int pf): array<struct<string,double>>
> ngrams(array<array<string>>, int N, int K, int pf): array<struct<string,double>>
> sentences(string str, string lang, string locale): array<array<string>>
> *UDAF*
> var_samp
> stddev_pop
> stddev_samp
> covar_pop
> covar_samp
> corr
> percentile: array<double>
> percentile_approx: array<double>
> histogram_numeric: array<struct {'x','y'}>
> collect_set  <— we have hashset
> collect_list 
> ntile



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org