You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "David Wayne Birdsall (JIRA)" <ji...@apache.org> on 2019/03/05 21:24:00 UTC

[jira] [Resolved] (TRAFODION-3282) Buffer overrun in ExHdfsScan::work in certain conditions

     [ https://issues.apache.org/jira/browse/TRAFODION-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Wayne Birdsall resolved TRAFODION-3282.
---------------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4

> Buffer overrun in ExHdfsScan::work in certain conditions
> --------------------------------------------------------
>
>                 Key: TRAFODION-3282
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3282
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-exe
>    Affects Versions: 2.4
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>            Priority: Major
>             Fix For: 2.4
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> If we have a large enough Hive text table with string columns, and the string columns have values that are longer than CQD HIVE_MAX_STRING_LENGTH_IN_BYTES, and there is no external table definition with longer column sizes given, we may core in ExHdfsScan::work with a buffer overrun.
> The following test case reproduces the behavior.
> First, use the following python script, called datagen.py:
> {quote}#! /usr/bin/env python
> import sys
> if len(sys.argv) != 5 or \
>  sys.argv[1].lower() == '-h' or \
>  sys.argv[1].lower() == '-help':
>  print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar colum
> ns> <varchar column length>'
>  sys.exit()
> f = open(sys.argv[1], "w+")
> marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
> for num_rows in range(0, int(sys.argv[2])):
>  f.write(str(num_rows) + '|')
>  for num_cols in range(0, int(sys.argv[3])):
>  f.write(marker[num_rows%len(marker)])
>  for i in range (1, int(sys.argv[4])):
>  f.write(str(i % 10))
>  f.write('|')
>  f.write(str(num_rows))
>  f.write('\n')
> f.close()
> {quote}
> Run this script as follows:
> {quote}chmod 755 ./datagen.py
> ./datagen.py ./data_lgvc.10rows_512KB.txt 10 2 524288
> {quote}
> Next, perform the following commands in a Hive shell:
> {quote}drop table if exists lgvc_base_table;
> create table lgvc_base_table(c_int int, c_string1 string, c_string2 string, p_in
> t int) row format delimited fields terminated by '|';
> load data local inpath './data_lgvc.10rows_512KB.txt' overwrite into table lgvc_
> base_table;
> {quote}
> Finally, do the following in sqlci:
> {quote}CQD HDFS_IO_BUFFERSIZE '2048';
> prepare s1 from select * from hive.hive.lgvc_base_table where c_int > 10;
> execute s1;
> {quote}
> (The point of the CQD is to reduce the default HDFS read buffer size to 2Mb rather than its default of 65Mb, so the test will fail with a smaller input file.)
> When this test case is run, we get a core with the following stack trace:
> {quote}(gdb) bt
> #0 0x00007ffff5116495 in raise () from /lib64/libc.so.6
> #1 0x00007ffff5117c75 in abort () from /lib64/libc.so.6
> #2 0x00007ffff6f02935 in ?? ()
>  from /usr/lib/jvm/java-1.7.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> #3 0x00007ffff707bfdf in ?? ()
>  from /usr/lib/jvm/java-1.7.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> #4 0x00007ffff6f077c2 in JVM_handle_linux_signal ()
>  from /usr/lib/jvm/java-1.7.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> #5 <signal handler called>
> #6 0x00007ffff516d753 in memcpy () from /lib64/libc.so.6
> #7 0x00007ffff35b4dd5 in ExHdfsScanTcb::work (this=0x7ffff7e99148)
>  at ../executor/ExHdfsScan.cpp:601
> #8 0x00007ffff333d7a1 in ex_tcb::sWork (tcb=0x7ffff7e99148)
>  at ../executor/ex_tcb.h:102
> #9 0x00007ffff350dba7 in ExSubtask::work (this=0x7ffff7e99ad0)
>  at ../executor/ExScheduler.cpp:757
> #10 0x00007ffff350cbf1 in ExScheduler::work (this=0x7ffff7e98cb0, prevWaitTime=
>  0) at ../executor/ExScheduler.cpp:280
> #11 0x00007ffff33a41c7 in ex_root_tcb::execute (this=0x7ffff7e99b78, 
>  cliGlobals=0xba5970, glob=0x7ffff7ea5d40, input_desc=0x7ffff7ee1178, 
>  diagsArea=@0x7ffffffee020, reExecute=0) at ../executor/ex_root.cpp:928
> #12 0x00007ffff4e4c452 in Statement::execute (this=0x7ffff7e84f40, cliGlobals=
>  0xba5970, input_desc=0x7ffff7ee1178, diagsArea=..., execute_state=
> ---Type <return> to continue, or q <return> to quit---q
> Statement:Quit
> (gdb)
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)