You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "hqx871 (via GitHub)" <gi...@apache.org> on 2023/05/16 12:12:04 UTC

[GitHub] [arrow] hqx871 opened a new issue, #35616: [c++] arrow::int32 throws exc_bad_access

hqx871 opened a new issue, #35616:
URL: https://github.com/apache/arrow/issues/35616

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   hi team! I use the 0.15.1 and found a problem when read parquet file, which contains array column.
   - The asas output
   parquet-low-level-example(49396,0x7ff848622680) malloc: nano zone abandoned due to inability to preallocate reserved vm space.
   /Users/bytedance/Downloads/test.parquet row num:1000000
   =================================================================
   ==49396==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0001087d73f8 at pc 0x0001076ecb8d bp 0x7ff7b8b4d6b0 sp 0x7ff7b8b4d6a8
   WRITE of size 8 at 0x0001087d73f8 thread T0
       #0 0x1076ecb8c in int arrow::util::RleDecoder::GetBatchWithDictSpaced<long long>(long long const*, long long*, int, int, unsigned char const*, long long) rle_encoding.h:488
       #1 0x1076e62c8 in parquet::DictDecoderImpl<parquet::PhysicalType<(parquet::Type::type)2> >::DecodeSpaced(long long*, int, int, unsigned char const*, long long) encoding.cc:1079
       #2 0x1075d9e6b in parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadValuesSpaced(long long, long long) column_reader.cc:1052
       #3 0x1075dc1a9 in parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadRecordData(long long) column_reader.cc:1096
       #4 0x1075d6a4c in parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2> >::ReadRecords(long long) column_reader.cc:822
       #5 0x1073d1583 in parquet::arrow::LeafReader::NextBatch(long long, std::__1::shared_ptr<arrow::ChunkedArray>*) reader.cc:414
       #6 0x1073d55bd in parquet::arrow::NestedListReader::NextBatch(long long, std::__1::shared_ptr<arrow::ChunkedArray>*) reader.cc:469
       #7 0x1073f5a82 in parquet::arrow::RowGroupRecordBatchReader::ReadNext(std::__1::shared_ptr<arrow::RecordBatch>*) reader.cc:320
       #8 0x1073b409a in printParquetFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) reader-writer.cc:97
       #9 0x1073b5209 in main reader-writer.cc:111
       #10 0x7ff8049b230f  (<unknown module>)
   
   0x0001087d73f8 is located 40 bytes to the left of global variable 'guard variable for arrow::SparseTensor::dim_name(int) const::kEmpty' defined in '/Users/bytedance/Downloads/arrow-apache-arrow-0.15.1/cpp/src/arrow/sparse_tensor.cc' (0x1087d7420) of size 8
   0x0001087d73f8 is located 0 bytes to the right of global variable 'kEmpty' defined in '/Users/bytedance/Downloads/arrow-apache-arrow-0.15.1/cpp/src/arrow/sparse_tensor.cc:415:28' (0x1087d73e0) of size 24
   SUMMARY: AddressSanitizer: global-buffer-overflow rle_encoding.h:488 in int arrow::util::RleDecoder::GetBatchWithDictSpaced<long long>(long long const*, long long*, int, int, unsigned char const*, long long)
   Shadow bytes around the buggy address:
     0x1000210fae20: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
     0x1000210fae30: 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 00 f9 f9 f9
     0x1000210fae40: 00 00 f9 f9 00 f9 f9 f9 01 f9 f9 f9 01 f9 f9 f9
     0x1000210fae50: 01 f9 f9 f9 01 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9
     0x1000210fae60: 01 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 00 00
   =>0x1000210fae70: 00 00 00 f9 f9 f9 f9 f9 00 00 00 00 00 00 00[f9]
     0x1000210fae80: f9 f9 f9 f9 00 f9 f9 f9 00 00 00 f9 f9 f9 f9 f9
     0x1000210fae90: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
     0x1000210faea0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
     0x1000210faeb0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
     0x1000210faec0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
   Shadow byte legend (one shadow byte represents 8 application bytes):
     Addressable:           00
     Partially addressable: 01 02 03 04 05 06 07 
     Heap left redzone:       fa
     Freed heap region:       fd
     Stack left redzone:      f1
     Stack mid redzone:       f2
     Stack right redzone:     f3
     Stack after return:      f5
     Stack use after scope:   f8
     Global redzone:          f9
     Global init order:       f6
     Poisoned by user:        f7
     Container overflow:      fc
     Array cookie:            ac
     Intra object redzone:    bb
     ASan internal:           fe
     Left alloca redzone:     ca
     Right alloca redzone:    cb
   ==49396==ABORTING
   
   Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35616: [c++] arrow::int32 throws exc_bad_access

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35616:
URL: https://github.com/apache/arrow/issues/35616#issuecomment-1549588157

   Would you min try in on latest parquet release?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou closed issue #35616: [c++] arrow::int32 throws exc_bad_access

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou closed issue #35616: [c++] arrow::int32 throws exc_bad_access 
URL: https://github.com/apache/arrow/issues/35616


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35616: [c++] arrow::int32 throws exc_bad_access

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35616:
URL: https://github.com/apache/arrow/issues/35616#issuecomment-1549905849

   I guess maybe you can find the releases after 0.15 and find out if there is any bugfixes...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] hqx871 commented on issue #35616: [c++] arrow::int32 throws exc_bad_access

Posted by "hqx871 (via GitHub)" <gi...@apache.org>.
hqx871 commented on issue #35616:
URL: https://github.com/apache/arrow/issues/35616#issuecomment-1549899604

   thanks for your reply. i have test the latest version and it works good. but i want to fix in the old version if i can find the root cause


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] hqx871 commented on issue #35616: [c++] arrow::int32 throws exc_bad_access

Posted by "hqx871 (via GitHub)" <gi...@apache.org>.
hqx871 commented on issue #35616:
URL: https://github.com/apache/arrow/issues/35616#issuecomment-1549546987

   I know this is too old.  Have anyone solved this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] hqx871 commented on issue #35616: [c++] arrow::int32 throws exc_bad_access

Posted by "hqx871 (via GitHub)" <gi...@apache.org>.
hqx871 commented on issue #35616:
URL: https://github.com/apache/arrow/issues/35616#issuecomment-1549550794

   The code is very simple.
   ```
   void printParquetFile(const std::string &path) {
     arrow::Status st;
     // Open Parquet file reader
     std::unique_ptr<parquet::arrow::FileReader> arrow_reader;
     auto file_reader = parquet::ParquetFileReader::OpenFile(path, true);
     st = parquet::arrow::FileReader::Make(
         arrow::default_memory_pool(),
         std::move(file_reader), &arrow_reader);
     if (!st.ok()) {
       throw std::runtime_error(st.ToString());
     }
   
     auto meta = arrow_reader->parquet_reader()->metadata();
     std::cout << path << " row num:" << meta->num_rows() << std::endl;
   
     //auto totalGroupNum = meta->num_row_groups();
     //std::map<std::string, int32_t> columnMap;
     auto schema = meta->schema();
     std::vector<int> readColumnIds;
     for (int i = 0; i < meta->num_columns(); ++i) {
       auto column = schema->Column(i);
       std::cout << "col:" << std::to_string(i)
                 << ", path:" << column->path()->ToDotString()
                 << ", name:" << column->path()->ToDotVector()[0]
                 << ", max definition level:" << column->max_definition_level()
                 << std::endl;
       readColumnIds.push_back(i);
     }
   
     for (int group = 0; group < meta->num_row_groups(); ++group) {
       auto rowGroup = meta->RowGroup(group);
       auto groupRowNum = rowGroup->num_rows();
       std::shared_ptr<arrow::RecordBatchReader> batchReader;
       st = arrow_reader->GetRecordBatchReader({group}, {18},
                                               &batchReader);
       if (!st.ok()) {
         // Handle error instantiating file reader...
         throw std::runtime_error(st.ToString());
       }
       int groupReadLines = 0;
       while (groupReadLines < groupRowNum) {
         std::shared_ptr<arrow::RecordBatch> rowBatch;
         //st = batchReader->ReadNext(&rowBatch);
         try{
           st = batchReader->ReadNext(&rowBatch);
         } catch (const std::exception& ex) {
           throw std::runtime_error(ex.what());
         }
         if (!st.ok()) {
           // Handle error instantiating file reader...
           throw std::runtime_error(st.ToString());
         }
         groupReadLines += rowBatch->num_rows();
       }
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org