You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2018/07/13 20:46:11 UTC

Array Index Out of Bounds in String Binary

So, as to the actual problem, I opened a JIRA here:

https://issues.apache.org/jira/browse/DRILL-6607

The reason I brought this here is my own curiosity:  Does an issue in using
this function most likely lie in the function code itself not handling good
data, or is the issue in the pcap pluglin which produces the data for this
function to consume, I am just curious on how something like this could be
avoided.

John

Re: Array Index Out of Bounds in String Binary

Posted by Vlad Rozov <vr...@apache.org>.
Yes, it is limit on the output: "when converted to a binary string would 
exceed 256 bytes as it does not reallocate the output buffer".

Java does not have optional arguments (it is strongly typed language). 
Java has overloaded functions :).

I'd suggest to file another JIRA. DRILL-6607 is a bug. A new JIRA is a 
request for new functionality.

Thank you,

Vlad

On 7/18/18 05:15, John Omernik wrote:
> Interesting, so the 256 limit is on output, not on input?
>
> Is Drill-6607 enough to track this?  If so, i have one more "feature" to
> add to it, not sure if I should include it on 6607 or create a new JIRA and
> link them. Basically, I'd like the ability to pass an int to the function.
> (Does Java have optional arguments? It must because of substr(data, start)
> and substr(data, start, nochars))  Basically string_binary(data) works as
> intended (with the limitation fixed) and string_binary(data, 1) would work
> on the binary, but replace EVERY character with the hex representation.
>
> And optional third would be to do a format string of some sort so the user
> could pick output, but I like the idea of having every character as hex for
> analysis.
>
> John
>
> On Tue, Jul 17, 2018 at 3:40 PM, Vlad Rozov <vr...@apache.org> wrote:
>
>> A. +1.
>>
>> B. Every byte in a binary data may require up to 4 bytes (0xXX) in the
>> string representation, so 80 may work, 60 should reliably work.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 7/17/18 13:14, John Omernik wrote:
>>
>>> Yet this works?....
>>>
>>> string_binary(byte_substr(`data`, 1, 80))
>>>
>>> On Tue, Jul 17, 2018 at 3:12 PM, John Omernik <jo...@omernik.com> wrote:
>>>
>>> So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
>>>> string_binary function, I still get an error.  Something else is
>>>> happening
>>>> here...
>>>>
>>>> select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
>>>> string_binary(byte_substr(`data`, 1, 200)) as mydata from
>>>> `user/jomernik/bf2_7306.pcap` limit 10
>>>>
>>>> I get the same
>>>>
>>>> Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
>>>> zeta3.brewingintel.com:20005]
>>>>
>>>> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
>>>> IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0,
>>>> 256))
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <jo...@omernik.com> wrote:
>>>>
>>>> Thanks Vlad a couple of thoughts.
>>>>>
>>>>> A. I think that should be fixed. That seems like a limitation that is
>>>>> both unexpected and undocumented.
>>>>>
>>>>> B.  Is there a way, if my data in the table is returned as binary to
>>>>> start with, for me to return the first 256 bytes? I tried substring, and
>>>>> tries to force to UTF-8 and I am getting some issues there.
>>>>>
>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <vr...@apache.org> wrote:
>>>>>
>>>>> In case of DRILL-6607 the issue lies in the implementation of
>>>>>> "string_binary" function: it is not prepared to handle incoming data
>>>>>> that
>>>>>> when converted to a binary string would exceed 256 bytes as it does not
>>>>>> reallocate the output buffer. Until the function code is fixed, the
>>>>>> only
>>>>>> way to avoid the error is either not to use "string_binary" or to use
>>>>>> it
>>>>>> with the data that meets "string_binary" limitation.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>>
>>>>>> On 7/13/18 14:01, Ted Dunning wrote:
>>>>>>
>>>>>> There are bounds for acceptable behavior for a function like this.
>>>>>>> Array
>>>>>>> index out of bounds is not acceptable. Aborting with a clean message
>>>>>>> about
>>>>>>> to true problem might be fine, as would be to return a null.
>>>>>>>
>>>>>>> On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:
>>>>>>>
>>>>>>> So, as to the actual problem, I opened a JIRA here:
>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>>>>>>
>>>>>>>> The reason I brought this here is my own curiosity:  Does an issue in
>>>>>>>> using
>>>>>>>> this function most likely lie in the function code itself not
>>>>>>>> handling
>>>>>>>> good
>>>>>>>> data, or is the issue in the pcap pluglin which produces the data for
>>>>>>>> this
>>>>>>>> function to consume, I am just curious on how something like this
>>>>>>>> could be
>>>>>>>> avoided.
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>>


Re: Array Index Out of Bounds in String Binary

Posted by John Omernik <jo...@omernik.com>.
Interesting, so the 256 limit is on output, not on input?

Is Drill-6607 enough to track this?  If so, i have one more "feature" to
add to it, not sure if I should include it on 6607 or create a new JIRA and
link them. Basically, I'd like the ability to pass an int to the function.
(Does Java have optional arguments? It must because of substr(data, start)
and substr(data, start, nochars))  Basically string_binary(data) works as
intended (with the limitation fixed) and string_binary(data, 1) would work
on the binary, but replace EVERY character with the hex representation.

And optional third would be to do a format string of some sort so the user
could pick output, but I like the idea of having every character as hex for
analysis.

John

On Tue, Jul 17, 2018 at 3:40 PM, Vlad Rozov <vr...@apache.org> wrote:

> A. +1.
>
> B. Every byte in a binary data may require up to 4 bytes (0xXX) in the
> string representation, so 80 may work, 60 should reliably work.
>
> Thank you,
>
> Vlad
>
>
> On 7/17/18 13:14, John Omernik wrote:
>
>> Yet this works?....
>>
>> string_binary(byte_substr(`data`, 1, 80))
>>
>> On Tue, Jul 17, 2018 at 3:12 PM, John Omernik <jo...@omernik.com> wrote:
>>
>> So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
>>> string_binary function, I still get an error.  Something else is
>>> happening
>>> here...
>>>
>>> select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
>>> string_binary(byte_substr(`data`, 1, 200)) as mydata from
>>> `user/jomernik/bf2_7306.pcap` limit 10
>>>
>>> I get the same
>>>
>>> Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
>>> zeta3.brewingintel.com:20005]
>>>
>>> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
>>> IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0,
>>> 256))
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <jo...@omernik.com> wrote:
>>>
>>> Thanks Vlad a couple of thoughts.
>>>>
>>>>
>>>> A. I think that should be fixed. That seems like a limitation that is
>>>> both unexpected and undocumented.
>>>>
>>>> B.  Is there a way, if my data in the table is returned as binary to
>>>> start with, for me to return the first 256 bytes? I tried substring, and
>>>> tries to force to UTF-8 and I am getting some issues there.
>>>>
>>>> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <vr...@apache.org> wrote:
>>>>
>>>> In case of DRILL-6607 the issue lies in the implementation of
>>>>> "string_binary" function: it is not prepared to handle incoming data
>>>>> that
>>>>> when converted to a binary string would exceed 256 bytes as it does not
>>>>> reallocate the output buffer. Until the function code is fixed, the
>>>>> only
>>>>> way to avoid the error is either not to use "string_binary" or to use
>>>>> it
>>>>> with the data that meets "string_binary" limitation.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>>
>>>>> On 7/13/18 14:01, Ted Dunning wrote:
>>>>>
>>>>> There are bounds for acceptable behavior for a function like this.
>>>>>> Array
>>>>>> index out of bounds is not acceptable. Aborting with a clean message
>>>>>> about
>>>>>> to true problem might be fine, as would be to return a null.
>>>>>>
>>>>>> On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:
>>>>>>
>>>>>> So, as to the actual problem, I opened a JIRA here:
>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>>>>>
>>>>>>> The reason I brought this here is my own curiosity:  Does an issue in
>>>>>>> using
>>>>>>> this function most likely lie in the function code itself not
>>>>>>> handling
>>>>>>> good
>>>>>>> data, or is the issue in the pcap pluglin which produces the data for
>>>>>>> this
>>>>>>> function to consume, I am just curious on how something like this
>>>>>>> could be
>>>>>>> avoided.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: Array Index Out of Bounds in String Binary

Posted by Vlad Rozov <vr...@apache.org>.
A. +1.

B. Every byte in a binary data may require up to 4 bytes (0xXX) in the 
string representation, so 80 may work, 60 should reliably work.

Thank you,

Vlad

On 7/17/18 13:14, John Omernik wrote:
> Yet this works?....
>
> string_binary(byte_substr(`data`, 1, 80))
>
> On Tue, Jul 17, 2018 at 3:12 PM, John Omernik <jo...@omernik.com> wrote:
>
>> So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
>> string_binary function, I still get an error.  Something else is happening
>> here...
>>
>> select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
>> string_binary(byte_substr(`data`, 1, 200)) as mydata from
>> `user/jomernik/bf2_7306.pcap` limit 10
>>
>> I get the same
>>
>> Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
>> zeta3.brewingintel.com:20005]
>>
>> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
>> IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256))
>>
>>
>>
>>
>>
>> On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <jo...@omernik.com> wrote:
>>
>>> Thanks Vlad a couple of thoughts.
>>>
>>>
>>> A. I think that should be fixed. That seems like a limitation that is
>>> both unexpected and undocumented.
>>>
>>> B.  Is there a way, if my data in the table is returned as binary to
>>> start with, for me to return the first 256 bytes? I tried substring, and
>>> tries to force to UTF-8 and I am getting some issues there.
>>>
>>> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <vr...@apache.org> wrote:
>>>
>>>> In case of DRILL-6607 the issue lies in the implementation of
>>>> "string_binary" function: it is not prepared to handle incoming data that
>>>> when converted to a binary string would exceed 256 bytes as it does not
>>>> reallocate the output buffer. Until the function code is fixed, the only
>>>> way to avoid the error is either not to use "string_binary" or to use it
>>>> with the data that meets "string_binary" limitation.
>>>>
>>>> Thank you,
>>>>
>>>> Vlad
>>>>
>>>>
>>>> On 7/13/18 14:01, Ted Dunning wrote:
>>>>
>>>>> There are bounds for acceptable behavior for a function like this.
>>>>> Array
>>>>> index out of bounds is not acceptable. Aborting with a clean message
>>>>> about
>>>>> to true problem might be fine, as would be to return a null.
>>>>>
>>>>> On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:
>>>>>
>>>>> So, as to the actual problem, I opened a JIRA here:
>>>>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>>>>
>>>>>> The reason I brought this here is my own curiosity:  Does an issue in
>>>>>> using
>>>>>> this function most likely lie in the function code itself not handling
>>>>>> good
>>>>>> data, or is the issue in the pcap pluglin which produces the data for
>>>>>> this
>>>>>> function to consume, I am just curious on how something like this
>>>>>> could be
>>>>>> avoided.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>


Re: Array Index Out of Bounds in String Binary

Posted by John Omernik <jo...@omernik.com>.
Yet this works?....

string_binary(byte_substr(`data`, 1, 80))

On Tue, Jul 17, 2018 at 3:12 PM, John Omernik <jo...@omernik.com> wrote:

> So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
> string_binary function, I still get an error.  Something else is happening
> here...
>
> select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
> string_binary(byte_substr(`data`, 1, 200)) as mydata from
> `user/jomernik/bf2_7306.pcap` limit 10
>
> I get the same
>
> Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
> zeta3.brewingintel.com:20005]
>
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256))
>
>
>
>
>
> On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <jo...@omernik.com> wrote:
>
>>
>> Thanks Vlad a couple of thoughts.
>>
>>
>> A. I think that should be fixed. That seems like a limitation that is
>> both unexpected and undocumented.
>>
>> B.  Is there a way, if my data in the table is returned as binary to
>> start with, for me to return the first 256 bytes? I tried substring, and
>> tries to force to UTF-8 and I am getting some issues there.
>>
>> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <vr...@apache.org> wrote:
>>
>>> In case of DRILL-6607 the issue lies in the implementation of
>>> "string_binary" function: it is not prepared to handle incoming data that
>>> when converted to a binary string would exceed 256 bytes as it does not
>>> reallocate the output buffer. Until the function code is fixed, the only
>>> way to avoid the error is either not to use "string_binary" or to use it
>>> with the data that meets "string_binary" limitation.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 7/13/18 14:01, Ted Dunning wrote:
>>>
>>>> There are bounds for acceptable behavior for a function like this.
>>>> Array
>>>> index out of bounds is not acceptable. Aborting with a clean message
>>>> about
>>>> to true problem might be fine, as would be to return a null.
>>>>
>>>> On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:
>>>>
>>>> So, as to the actual problem, I opened a JIRA here:
>>>>>
>>>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>>>
>>>>> The reason I brought this here is my own curiosity:  Does an issue in
>>>>> using
>>>>> this function most likely lie in the function code itself not handling
>>>>> good
>>>>> data, or is the issue in the pcap pluglin which produces the data for
>>>>> this
>>>>> function to consume, I am just curious on how something like this
>>>>> could be
>>>>> avoided.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>
>>
>

Re: Array Index Out of Bounds in String Binary

Posted by John Omernik <jo...@omernik.com>.
So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
string_binary function, I still get an error.  Something else is happening
here...

select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
string_binary(byte_substr(`data`, 1, 200)) as mydata from
`user/jomernik/bf2_7306.pcap` limit 10

I get the same

Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
zeta3.brewingintel.com:20005]

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256))





On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <jo...@omernik.com> wrote:

>
> Thanks Vlad a couple of thoughts.
>
>
> A. I think that should be fixed. That seems like a limitation that is both
> unexpected and undocumented.
>
> B.  Is there a way, if my data in the table is returned as binary to start
> with, for me to return the first 256 bytes? I tried substring, and tries to
> force to UTF-8 and I am getting some issues there.
>
> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <vr...@apache.org> wrote:
>
>> In case of DRILL-6607 the issue lies in the implementation of
>> "string_binary" function: it is not prepared to handle incoming data that
>> when converted to a binary string would exceed 256 bytes as it does not
>> reallocate the output buffer. Until the function code is fixed, the only
>> way to avoid the error is either not to use "string_binary" or to use it
>> with the data that meets "string_binary" limitation.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 7/13/18 14:01, Ted Dunning wrote:
>>
>>> There are bounds for acceptable behavior for a function like this.  Array
>>> index out of bounds is not acceptable. Aborting with a clean message
>>> about
>>> to true problem might be fine, as would be to return a null.
>>>
>>> On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:
>>>
>>> So, as to the actual problem, I opened a JIRA here:
>>>>
>>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>>
>>>> The reason I brought this here is my own curiosity:  Does an issue in
>>>> using
>>>> this function most likely lie in the function code itself not handling
>>>> good
>>>> data, or is the issue in the pcap pluglin which produces the data for
>>>> this
>>>> function to consume, I am just curious on how something like this could
>>>> be
>>>> avoided.
>>>>
>>>> John
>>>>
>>>>
>>
>

Re: Array Index Out of Bounds in String Binary

Posted by John Omernik <jo...@omernik.com>.
Thanks Vlad a couple of thoughts.


A. I think that should be fixed. That seems like a limitation that is both
unexpected and undocumented.

B.  Is there a way, if my data in the table is returned as binary to start
with, for me to return the first 256 bytes? I tried substring, and tries to
force to UTF-8 and I am getting some issues there.

On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <vr...@apache.org> wrote:

> In case of DRILL-6607 the issue lies in the implementation of
> "string_binary" function: it is not prepared to handle incoming data that
> when converted to a binary string would exceed 256 bytes as it does not
> reallocate the output buffer. Until the function code is fixed, the only
> way to avoid the error is either not to use "string_binary" or to use it
> with the data that meets "string_binary" limitation.
>
> Thank you,
>
> Vlad
>
>
> On 7/13/18 14:01, Ted Dunning wrote:
>
>> There are bounds for acceptable behavior for a function like this.  Array
>> index out of bounds is not acceptable. Aborting with a clean message about
>> to true problem might be fine, as would be to return a null.
>>
>> On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:
>>
>> So, as to the actual problem, I opened a JIRA here:
>>>
>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>
>>> The reason I brought this here is my own curiosity:  Does an issue in
>>> using
>>> this function most likely lie in the function code itself not handling
>>> good
>>> data, or is the issue in the pcap pluglin which produces the data for
>>> this
>>> function to consume, I am just curious on how something like this could
>>> be
>>> avoided.
>>>
>>> John
>>>
>>>
>

Re: Array Index Out of Bounds in String Binary

Posted by Vlad Rozov <vr...@apache.org>.
In case of DRILL-6607 the issue lies in the implementation of 
"string_binary" function: it is not prepared to handle incoming data 
that when converted to a binary string would exceed 256 bytes as it does 
not reallocate the output buffer. Until the function code is fixed, the 
only way to avoid the error is either not to use "string_binary" or to 
use it with the data that meets "string_binary" limitation.

Thank you,

Vlad

On 7/13/18 14:01, Ted Dunning wrote:
> There are bounds for acceptable behavior for a function like this.  Array
> index out of bounds is not acceptable. Aborting with a clean message about
> to true problem might be fine, as would be to return a null.
>
> On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:
>
>> So, as to the actual problem, I opened a JIRA here:
>>
>> https://issues.apache.org/jira/browse/DRILL-6607
>>
>> The reason I brought this here is my own curiosity:  Does an issue in using
>> this function most likely lie in the function code itself not handling good
>> data, or is the issue in the pcap pluglin which produces the data for this
>> function to consume, I am just curious on how something like this could be
>> avoided.
>>
>> John
>>


Re: Array Index Out of Bounds in String Binary

Posted by Ted Dunning <te...@gmail.com>.
There are bounds for acceptable behavior for a function like this.  Array
index out of bounds is not acceptable. Aborting with a clean message about
to true problem might be fine, as would be to return a null.

On Fri, Jul 13, 2018, 13:46 John Omernik <jo...@omernik.com> wrote:

> So, as to the actual problem, I opened a JIRA here:
>
> https://issues.apache.org/jira/browse/DRILL-6607
>
> The reason I brought this here is my own curiosity:  Does an issue in using
> this function most likely lie in the function code itself not handling good
> data, or is the issue in the pcap pluglin which produces the data for this
> function to consume, I am just curious on how something like this could be
> avoided.
>
> John
>