You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Bob Rudis (JIRA)" <ji...@apache.org> on 2018/07/23 12:44:00 UTC

[jira] [Created] (DRILL-6628) Possible incorporation of Twitter text processing UDFs into Drill-proper

Bob Rudis created DRILL-6628:
--------------------------------

             Summary: Possible incorporation of Twitter text processing UDFs into Drill-proper
                 Key: DRILL-6628
                 URL: https://issues.apache.org/jira/browse/DRILL-6628
             Project: Apache Drill
          Issue Type: Improvement
          Components: Functions - Drill
            Reporter: Bob Rudis


Per the User mailing list thread — [https://mail-archives.apache.org/mod_mbox/drill-user/201807.mbox/%3Caef1979d-f454-4691-8607-8267adf2ac1e%40getmailbird.com%3E] — submitting the possibility for the inclusion of drill-twitter-text — [https://github.com/hrbrmstr/drill-twitter-text] — into Drill-proper.

Shifting the conversation here since it's more appropriate and CC'ing [~cgivre] who posited the idea.

On the one hand, there are function groups such as "Phonetic" and "String Distance" so there's precedent for inclusion of "non-boring-SQL"-like functions into Drill-proper. On the other hand, this is a small addition of a handful of functions for Twitter text so would this be to niche for a "Twitter"  function group?

As noted in the mailing list thread, there are more "cyber"-ish UDFs on the way (still kinda hoping for that guava upgrade that I saw mentioned in various places in jira), so would the Twitter components be in a "Cyber" group?

Regardless, I'll take a look at how the functions are structured in the Drill source tree and gladly machinate the necessary changes/inclusions if the result of this discussion results in that decision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [jira] [Created] (DRILL-6628) Possible incorporation of Twitter text processing UDFs into Drill-proper

Posted by Charles Givre <cg...@gmail.com>.
Hi Bob, 
I was inspired a little by OSQuery and MySQL, but I’ve written a lot of UDFs that extend basic SQL functionality and add other capabilities to Drill. IMHO, since Drill isn’t a database, it really is a very helpful addition and will get more people using Drill.  I’d personally be very interested in your Cyber-ish UDFs.  

FYI, there are a collection of Network analysis functions already in Drill:
Networking Functions
Drill supports the following networking functions to facilitate network analysis using Drill: 

inet_aton(<ip>): Converts an IPv4 address into an integer
inet_ntoa( <int>): Converts an integer IP into dotted decimal notation
in_network( <ip>,<cidr> ): Returns true if the IP address is in the given CIDR block
address_count( <cidr> ): Returns the number of IPs in a given CIDR block
broadcast_address( <cidr> ): Returns the broadcast address for a given CIDR block
netmask(<cidr> ): Returns the netmask for a given CIDR block
low_address(<cidr>): Returns the first address in a given CIDR block
high_address(<cidr>): Returns the last address in a given CIDR block
url_encode( <url> ): Returns a URL encoded string
url_decode( <url> ): Decodes ``a URL encoded string
is_valid_IP(<ip>): Returns true if the IP is a valid IP address
is_private_ip(<ip>): Returns true if the IP is a private IPv4 address
is_valid_IPv4(<ip>): Returns true if the IP is a valid IPv4 address
is_valid_IPv6(<ip>): Returns true if the IP is a valid IPv6 address

I’ve been working on a few other security related hackery including Drill UDFs that do DNS lookups and Whois data.  Also, I assume you saw that Drill-6104 which is a generic regex/log format plugin.  I’m working on a syslog/RFC-5424 format plugin for Drill which I intend to submit for Drill 1.15.  Anyway, my point being IMHO, Drill is a great tool for cyber data analysis and the more goodness we have officially part of Drill the better things are. 

Best,
—C 


> On Jul 23, 2018, at 08:44, Bob Rudis (JIRA) <ji...@apache.org> wrote:
> 
> Bob Rudis created DRILL-6628:
> --------------------------------
> 
>             Summary: Possible incorporation of Twitter text processing UDFs into Drill-proper
>                 Key: DRILL-6628
>                 URL: https://issues.apache.org/jira/browse/DRILL-6628
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>            Reporter: Bob Rudis
> 
> 
> Per the User mailing list thread — [https://mail-archives.apache.org/mod_mbox/drill-user/201807.mbox/%3Caef1979d-f454-4691-8607-8267adf2ac1e%40getmailbird.com%3E] — submitting the possibility for the inclusion of drill-twitter-text — [https://github.com/hrbrmstr/drill-twitter-text] — into Drill-proper.
> 
> Shifting the conversation here since it's more appropriate and CC'ing [~cgivre] who posited the idea.
> 
> On the one hand, there are function groups such as "Phonetic" and "String Distance" so there's precedent for inclusion of "non-boring-SQL"-like functions into Drill-proper. On the other hand, this is a small addition of a handful of functions for Twitter text so would this be to niche for a "Twitter"  function group?
> 
> As noted in the mailing list thread, there are more "cyber"-ish UDFs on the way (still kinda hoping for that guava upgrade that I saw mentioned in various places in jira), so would the Twitter components be in a "Cyber" group?
> 
> Regardless, I'll take a look at how the functions are structured in the Drill source tree and gladly machinate the necessary changes/inclusions if the result of this discussion results in that decision.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)