You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/06/22 17:38:16 UTC

[Hadoop Wiki] Update of "Hive/LanguageManual/UDF" by DexterFryar

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/UDF" page has been changed by DexterFryar:
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=68&rev2=69

Comment:
Fixed regexp_extract 3rd parameter was misspelled and there was no description as to what 'index' meant.

  ||string ||ltrim(string A) ||Returns the string resulting from trimming spaces from the beginning(left hand side) of A e.g. ltrim(' foobar ') results in 'foobar ' ||
  ||string ||rtrim(string A) ||Returns the string resulting from trimming spaces from the end(right hand side) of A e.g. rtrim(' foobar ') results in ' foobar' ||
  ||string ||regexp_replace(string A, string B, string C) ||Returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See Java regular expressions syntax) with C e.g. regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. ||
- ||string ||regexp_extract(string subject, string pattern, int intex) ||Returns the string extracted using the pattern. e.g. regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. ||
+ ||string ||regexp_extract(string subject, string pattern, int index) ||Returns the string extracted using the pattern. e.g. regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.  The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.||
  ||string ||parse_url(string urlString, string partToExtract [, string keyToExtract]) ||Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. e.g. parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST')  returns 'facebook.com'. Also a value of a particular key in QUERY can be extracted by providing the key as the third argument, e.g. parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1')  returns 'v1'. ||
  ||string ||get_json_object(string json_string, string path) ||Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid. '''NOTE: The json path can only have the characters [0-9a-z_], i.e., no upper-case or special characters. Also, the keys *cannot* start with numbers.''' This is due to restrictions on Hive column names. ||
  ||string ||space(int n) ||Return a string of n spaces ||