You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "White4, Ryan (STATCAN)" <ry...@canada.ca> on 2019/01/08 19:49:32 UTC

RecordBatchFile with no batches, Error: Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

Hi,

I get an error when writing a file with no record batches. I came across this when implementing a simple way to spill the buffer to disk automatically (this is potentially coming in release 0.12???).

I'm using pyarrow 0.11.
Is there a JIRA related to this, or is there a problem in this simple example below:

my_schema = pa.schema([('field0', pa.int32())])
sink = pa.BufferOutputStream()
writer = pa.RecordBatchFileWriter(sink, my_schema)
writer.close()
buf = sink.getvalue()

reader = pa.open_file(buf)
print(reader.schema)
print(reader.num_record_batches)

Traceback...
Reader = pa.open_file(buf)
Pyarrow/ipc.py, line142, in open_file
Return RecordBatchFileReader(source, fotter_offset=footer_offset)
Pyarrow/ipc.py, line 89, in __init__
Self._open(source, footer_offset=fotter_offset)
Pyarrow/ipc.pxi, line 352
Pyarrow/error.pxi, line 81
Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

Thanks,
Ryan


Ryan Mackenzie White, Ph. D.

Senior Research Analyst - Administrative Data Division, Analytical Studies, Methodology and Statistical Infrastructure Field
Statistics Canada / Government of Canada
ryan.white4@canada.ca<ma...@canada.ca> / Tel: 613-608-0015

Analyste principal de recherche- Division des données administratives, Secteur des études analytiques, de la méthodologie et de l'infrastructure statistique
Statistique Canada / Gouvernement du Canada
ryan.white4@canada.ca<ma...@canada.ca> / Tél. : 613-608-0015