You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Charles Givre (Jira)" <ji...@apache.org> on 2022/09/25 04:00:02 UTC

[jira] [Commented] (DRILL-8318) httpd format parser throws exception on log item with malformed query string

    [ https://issues.apache.org/jira/browse/DRILL-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609078#comment-17609078 ] 

Charles Givre commented on DRILL-8318:
--------------------------------------

[~nielsbasjes], could you take a look?

> httpd format parser throws exception on log item with malformed query string
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-8318
>                 URL: https://issues.apache.org/jira/browse/DRILL-8318
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.19.0
>         Environment: drill-embedded
> openjdk version "1.8.0_342"
> OpenJDK Runtime Environment Corretto-8.342.07.1 (build 1.8.0_342-b07)
> OpenJDK 64-Bit Server VM Corretto-8.342.07.1 (build 25.342-b07, mixed mode)
> Ubuntu 20.04.4 LTS (Focal Fossa)
> Running under WSL on Windows 11
>            Reporter: Richard Downer
>            Priority: Major
>         Attachments: testcase
>
>
> I am running Apache Drill over my httpd-style access logs. These are collecting data from requests on the open Internet, which sometimes means questionable requests made by remote Internet users (sometimes with hostile intent).
> One such style of request looks like this:
> {{151.236.216.243 - - [15/Sep/2022:20:18:07 +0000] "GET /?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 301 178 "-" "curl/7.54.0"}}
> I have put this request into a new log file containing only this line, as a test case. I initiate a query:
> {{select request_receive_time, request_status_last, request_firstline_method, request_firstline_uri from table(dfs.`/home/richard/drill/access-logs/nginx/testcase`(type=>'httpd', logFormat=>'combined')) where request_status_last = 404;}}
> This produces this error:
> {{Error: DATA_READ ERROR: Error reading HTTPD file at line number 0}}
> {{Error occurred during setter call: null caused by "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" when calling "public void org.apache.drill.exec.store.httpd.HttpdLogRecord.setWildcard(java.lang.String,java.lang.String)" for  key = "STRING:request.firstline.uri.query.*"  name = "STRING:request.firstline.uri.query"  value = "Value\{filled=STRING, s='PHPE9568F36-D428-11d2-A769-00AA001ACF42', l=null, d=null}" castsTo = "[STRING]"}}
> {{Format plugin: httpd}}
> {{Format plugin: HttpdLogFormatPlugin}}
> {{Plugin config name: null}}
> {{Fragment: 0:0}}
> While I appreciate that the query string part of the request is probably malformed according to a strict interpretation, this is a real request seen "in the wild" and I would prefer that Drill is robust enough to deal with the type of garbage requests frequently seen on real web server.
> Thank you for your assistance - if I can provide any more information that would help please let me know!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)