You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Richard Downer (Jira)" <ji...@apache.org> on 2022/09/23 21:55:00 UTC

[jira] [Created] (DRILL-8318) httpd format parser throws exception on log item with malformed query string

Richard Downer created DRILL-8318:
-------------------------------------

             Summary: httpd format parser throws exception on log item with malformed query string
                 Key: DRILL-8318
                 URL: https://issues.apache.org/jira/browse/DRILL-8318
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.19.0
         Environment: drill-embedded

openjdk version "1.8.0_342"
OpenJDK Runtime Environment Corretto-8.342.07.1 (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM Corretto-8.342.07.1 (build 25.342-b07, mixed mode)

Ubuntu 20.04.4 LTS (Focal Fossa)

Running under WSL on Windows 11
            Reporter: Richard Downer
         Attachments: testcase

I am running Apache Drill over my httpd-style access logs. These are collecting data from requests on the open Internet, which sometimes means questionable requests made by remote Internet users (sometimes with hostile intent).

One such style of request looks like this:

{{151.236.216.243 - - [15/Sep/2022:20:18:07 +0000] "GET /?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 301 178 "-" "curl/7.54.0"}}

I have put this request into a new log file containing only this line, as a test case. I initiate a query:

{{select request_receive_time, request_status_last, request_firstline_method, request_firstline_uri from table(dfs.`/home/richard/drill/access-logs/nginx/testcase`(type=>'httpd', logFormat=>'combined')) where request_status_last = 404;}}

This produces this error:

{{Error: DATA_READ ERROR: Error reading HTTPD file at line number 0}}

{{Error occurred during setter call: null caused by "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" when calling "public void org.apache.drill.exec.store.httpd.HttpdLogRecord.setWildcard(java.lang.String,java.lang.String)" for  key = "STRING:request.firstline.uri.query.*"  name = "STRING:request.firstline.uri.query"  value = "Value\{filled=STRING, s='PHPE9568F36-D428-11d2-A769-00AA001ACF42', l=null, d=null}" castsTo = "[STRING]"}}
{{Format plugin: httpd}}
{{Format plugin: HttpdLogFormatPlugin}}
{{Plugin config name: null}}
{{Fragment: 0:0}}

While I appreciate that the query string part of the request is probably malformed according to a strict interpretation, this is a real request seen "in the wild" and I would prefer that Drill is robust enough to deal with the type of garbage requests frequently seen on real web server.

Thank you for your assistance - if I can provide any more information that would help please let me know!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)