You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Bob Rudis (JIRA)" <ji...@apache.org> on 2018/07/23 12:44:00 UTC
[jira] [Created] (DRILL-6628) Possible incorporation of Twitter
text processing UDFs into Drill-proper
Bob Rudis created DRILL-6628:
--------------------------------
Summary: Possible incorporation of Twitter text processing UDFs into Drill-proper
Key: DRILL-6628
URL: https://issues.apache.org/jira/browse/DRILL-6628
Project: Apache Drill
Issue Type: Improvement
Components: Functions - Drill
Reporter: Bob Rudis
Per the User mailing list thread — [https://mail-archives.apache.org/mod_mbox/drill-user/201807.mbox/%3Caef1979d-f454-4691-8607-8267adf2ac1e%40getmailbird.com%3E] — submitting the possibility for the inclusion of drill-twitter-text — [https://github.com/hrbrmstr/drill-twitter-text] — into Drill-proper.
Shifting the conversation here since it's more appropriate and CC'ing [~cgivre] who posited the idea.
On the one hand, there are function groups such as "Phonetic" and "String Distance" so there's precedent for inclusion of "non-boring-SQL"-like functions into Drill-proper. On the other hand, this is a small addition of a handful of functions for Twitter text so would this be to niche for a "Twitter" function group?
As noted in the mailing list thread, there are more "cyber"-ish UDFs on the way (still kinda hoping for that guava upgrade that I saw mentioned in various places in jira), so would the Twitter components be in a "Cyber" group?
Regardless, I'll take a look at how the functions are structured in the Drill source tree and gladly machinate the necessary changes/inclusions if the result of this discussion results in that decision.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Re: [jira] [Created] (DRILL-6628) Possible incorporation of Twitter
text processing UDFs into Drill-proper
Posted by Charles Givre <cg...@gmail.com>.
Hi Bob,
I was inspired a little by OSQuery and MySQL, but I’ve written a lot of UDFs that extend basic SQL functionality and add other capabilities to Drill. IMHO, since Drill isn’t a database, it really is a very helpful addition and will get more people using Drill. I’d personally be very interested in your Cyber-ish UDFs.
FYI, there are a collection of Network analysis functions already in Drill:
Networking Functions
Drill supports the following networking functions to facilitate network analysis using Drill:
inet_aton(<ip>): Converts an IPv4 address into an integer
inet_ntoa( <int>): Converts an integer IP into dotted decimal notation
in_network( <ip>,<cidr> ): Returns true if the IP address is in the given CIDR block
address_count( <cidr> ): Returns the number of IPs in a given CIDR block
broadcast_address( <cidr> ): Returns the broadcast address for a given CIDR block
netmask(<cidr> ): Returns the netmask for a given CIDR block
low_address(<cidr>): Returns the first address in a given CIDR block
high_address(<cidr>): Returns the last address in a given CIDR block
url_encode( <url> ): Returns a URL encoded string
url_decode( <url> ): Decodes ``a URL encoded string
is_valid_IP(<ip>): Returns true if the IP is a valid IP address
is_private_ip(<ip>): Returns true if the IP is a private IPv4 address
is_valid_IPv4(<ip>): Returns true if the IP is a valid IPv4 address
is_valid_IPv6(<ip>): Returns true if the IP is a valid IPv6 address
I’ve been working on a few other security related hackery including Drill UDFs that do DNS lookups and Whois data. Also, I assume you saw that Drill-6104 which is a generic regex/log format plugin. I’m working on a syslog/RFC-5424 format plugin for Drill which I intend to submit for Drill 1.15. Anyway, my point being IMHO, Drill is a great tool for cyber data analysis and the more goodness we have officially part of Drill the better things are.
Best,
—C
> On Jul 23, 2018, at 08:44, Bob Rudis (JIRA) <ji...@apache.org> wrote:
>
> Bob Rudis created DRILL-6628:
> --------------------------------
>
> Summary: Possible incorporation of Twitter text processing UDFs into Drill-proper
> Key: DRILL-6628
> URL: https://issues.apache.org/jira/browse/DRILL-6628
> Project: Apache Drill
> Issue Type: Improvement
> Components: Functions - Drill
> Reporter: Bob Rudis
>
>
> Per the User mailing list thread — [https://mail-archives.apache.org/mod_mbox/drill-user/201807.mbox/%3Caef1979d-f454-4691-8607-8267adf2ac1e%40getmailbird.com%3E] — submitting the possibility for the inclusion of drill-twitter-text — [https://github.com/hrbrmstr/drill-twitter-text] — into Drill-proper.
>
> Shifting the conversation here since it's more appropriate and CC'ing [~cgivre] who posited the idea.
>
> On the one hand, there are function groups such as "Phonetic" and "String Distance" so there's precedent for inclusion of "non-boring-SQL"-like functions into Drill-proper. On the other hand, this is a small addition of a handful of functions for Twitter text so would this be to niche for a "Twitter" function group?
>
> As noted in the mailing list thread, there are more "cyber"-ish UDFs on the way (still kinda hoping for that guava upgrade that I saw mentioned in various places in jira), so would the Twitter components be in a "Cyber" group?
>
> Regardless, I'll take a look at how the functions are structured in the Drill source tree and gladly machinate the necessary changes/inclusions if the result of this discussion results in that decision.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)