You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/10/21 07:21:01 UTC

[jira] [Updated] (HIVE-3906) URI_Escape and URI_UnEscape UDF

     [ https://issues.apache.org/jira/browse/HIVE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stamatis Zampetakis updated HIVE-3906:
--------------------------------------
    Fix Version/s:     (was: 0.8.1)

I cleared the fixVersion field since this ticket is still open. Please review this ticket and if the fix is already committed to a specific version please set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] the fixVersion should be set only when the issue is resolved/closed.

> URI_Escape and URI_UnEscape UDF
> -------------------------------
>
>                 Key: HIVE-3906
>                 URL: https://issues.apache.org/jira/browse/HIVE-3906
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>    Affects Versions: 0.8.1
>         Environment: Hadoop 0.20.1
> Java 1.6.0
>            Reporter: Liu Zongquan
>            Priority: Major
>              Labels: patch
>         Attachments: HIVE-3906.1.patch.txt, udf_uri_escape.q, udf_uri_escape.q.out, udf_uri_unescape.q, udf_uri_unescape.q.out
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Current releases of Hive lacks a function which would encode URL or form parameters or it escapes the URI.
> The function URI_ESCAPE (uri) would return the encoded form  of the URI which would be useful while using HiveQL.Its always advisable to encode URL or form parameters; plain form parameter is vulnerable to cross site attack, SQL injection and may direct our web application into some unpredicted output.
> Functionality :-
> Function Name: URI_ESCAPE (uri)
> Returns the encoded form of the uri.
> Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t');
> -> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'
> Usage :-
> Case 1 : To get encoded uri corresponding to a particular uri
> hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2');
> -> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'
> Case 2 : To query a table to get encoded form of the urls corresponding to users
> Table :- USER_URLS
> userid |url
> USR00001|http://www.example.com?a=l&t   
> USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf                      
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
> USR01000|http://google.com/resource?key=value
> USR10000|http://google.com/resource?key=value1 & value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher://gopher.voa.gov
> USR10100|http://www.apple.com/index.html
> USR11000|file:/data/letters/to_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html 
> Query : select userid,url,uri_escape(uri) from USER_URLS;
> Result :-
> USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t   
> USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf                     
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http://google.com/resource?key=value1 & value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Current releases of Hive lacks a function which would decode the encoded uri.
> The function URI_UNESCAPE (uri) would return the decoded form  of the encoded URI which would be useful while using HiveQL.This function converts the specified string by replacing any escape sequences with their unescaped representation.
> Functionality :-
> Function Name: URI_UNESCAPE (uri)
> Returns the decoded form of the encoded uri.
> Example: hive> SELECT URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t');
> -> 'http://www.example.com?a=l&t'
> Usage :-
> Case 1 : To get decoded uri corresponding to a particular encoded uri
> hive> SELECT URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2');
> -> 'http://google.com/resource?key=value1 & value2'
> Case 2 : To query a table to get decoded form of the encoded urls corresponding to users
> Table :- USER_URLS
> userid |encodedurl
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS;
> Result :-
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1 & value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)