You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2019/11/04 23:18:00 UTC

[jira] [Created] (ARROW-7063) [C++] Schema print method prints too much metadata

Neal Richardson created ARROW-7063:
--------------------------------------

             Summary: [C++] Schema print method prints too much metadata
                 Key: ARROW-7063
                 URL: https://issues.apache.org/jira/browse/ARROW-7063
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, C++ - Dataset
            Reporter: Neal Richardson
             Fix For: 1.0.0


I loaded some taxi data in a Dataset and printed the schema. This is what was printed:

{code}
vendor_id: string
pickup_at: timestamp[us]
dropoff_at: timestamp[us]
passenger_count: int8
trip_distance: float
pickup_longitude: float
pickup_latitude: float
rate_code_id: null
store_and_fwd_flag: string
dropoff_longitude: float
dropoff_latitude: float
payment_type: string
fare_amount: float
extra: float
mta_tax: float
tip_amount: float
tolls_amount: float
total_amount: float
-- metadata --
pandas: {"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 14387371, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "vendor_id", "field_name": "vendor_id", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "pickup_at", "field_name": "pickup_at", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "dropoff_at", "field_name": "dropoff_at", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "passenger_count", "field_name": "passenger_count", "pandas_type": "int8", "numpy_type": "int8", "metadata": null}, {"name": "trip_distance", "field_name": "trip_distance", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "pickup_longitude", "field_name": "pickup_longitude", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "pickup_latitude", "field_name": "pickup_latitude", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "rate_code_id", "field_name": "rate_code_id", "pandas_type": "empty", "numpy_type": "object", "metadata": null}, {"name": "store_and_fwd_flag", "field_name": "store_and_fwd_flag", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "dropoff_longitude", "field_name": "dropoff_longitude", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "dropoff_latitude", "field_name": "dropoff_latitude", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "payment_type", "field_name": "payment_type", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "fare_amount", "field_name": "fare_amount", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "extra", "field_name": "extra", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "mta_tax", "field_name": "mta_tax", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "tip_amount", "field_name": "tip_amount", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "tolls_amount", "field_name": "tolls_amount", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "total_amount", "field_name": "total_amount", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}], "creator": {"library": "pyarrow", "version": "0.15.1"}, "pandas_version": "0.25.3"}
ARROW:schema: /////3gOAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABAwAQAAAAAAAKAAwAAAAEAAgACgAAAFQKAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAsCgAABAAAAB8KAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDE0Mzg3MzcxLCAic3RlcCI6IDF9XSwgImNvbHVtbl9pbmRleGVzIjogW3sibmFtZSI6IG51bGwsICJmaWVsZF9uYW1lIjogbnVsbCwgInBhbmRhc190eXBlIjogInVuaWNvZGUiLCAibnVtcHlfdHlwZSI6ICJvYmplY3QiLCAibWV0YWRhdGEiOiB7ImVuY29kaW5nIjogIlVURi04In19XSwgImNvbHVtbnMiOiBbeyJuYW1lIjogInZlbmRvcl9pZCIsICJmaWVsZF9uYW1lIjogInZlbmRvcl9pZCIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJwaWNrdXBfYXQiLCAiZmllbGRfbmFtZSI6ICJwaWNrdXBfYXQiLCAicGFuZGFzX3R5cGUiOiAiZGF0ZXRpbWUiLCAibnVtcHlfdHlwZSI6ICJkYXRldGltZTY0W25zXSIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiZHJvcG9mZl9hdCIsICJmaWVsZF9uYW1lIjogImRyb3BvZmZfYXQiLCAicGFuZGFzX3R5cGUiOiAiZGF0ZXRpbWUiLCAibnVtcHlfdHlwZSI6ICJkYXRldGltZTY0W25zXSIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAicGFzc2VuZ2VyX2NvdW50IiwgImZpZWxkX25hbWUiOiAicGFzc2VuZ2VyX2NvdW50IiwgInBhbmRhc190eXBlIjogImludDgiLCAibnVtcHlfdHlwZSI6ICJpbnQ4IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJ0cmlwX2Rpc3RhbmNlIiwgImZpZWxkX25hbWUiOiAidHJpcF9kaXN0YW5jZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAicGlja3VwX2xvbmdpdHVkZSIsICJmaWVsZF9uYW1lIjogInBpY2t1cF9sb25naXR1ZGUiLCAicGFuZGFzX3R5cGUiOiAiZmxvYXQzMiIsICJudW1weV90eXBlIjogImZsb2F0MzIiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogInBpY2t1cF9sYXRpdHVkZSIsICJmaWVsZF9uYW1lIjogInBpY2t1cF9sYXRpdHVkZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAicmF0ZV9jb2RlX2lkIiwgImZpZWxkX25hbWUiOiAicmF0ZV9jb2RlX2lkIiwgInBhbmRhc190eXBlIjogImVtcHR5IiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJzdG9yZV9hbmRfZndkX2ZsYWciLCAiZmllbGRfbmFtZSI6ICJzdG9yZV9hbmRfZndkX2ZsYWciLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiZHJvcG9mZl9sb25naXR1ZGUiLCAiZmllbGRfbmFtZSI6ICJkcm9wb2ZmX2xvbmdpdHVkZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiZHJvcG9mZl9sYXRpdHVkZSIsICJmaWVsZF9uYW1lIjogImRyb3BvZmZfbGF0aXR1ZGUiLCAicGFuZGFzX3R5cGUiOiAiZmxvYXQzMiIsICJudW1weV90eXBlIjogImZsb2F0MzIiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogInBheW1lbnRfdHlwZSIsICJmaWVsZF9uYW1lIjogInBheW1lbnRfdHlwZSIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJmYXJlX2Ftb3VudCIsICJmaWVsZF9uYW1lIjogImZhcmVfYW1vdW50IiwgInBhbmRhc190eXBlIjogImZsb2F0MzIiLCAibnVtcHlfdHlwZSI6ICJmbG9hdDMyIiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJleHRyYSIsICJmaWVsZF9uYW1lIjogImV4dHJhIiwgInBhbmRhc190eXBlIjogImZsb2F0MzIiLCAibnVtcHlfdHlwZSI6ICJmbG9hdDMyIiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJtdGFfdGF4IiwgImZpZWxkX25hbWUiOiAibXRhX3RheCIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAidGlwX2Ftb3VudCIsICJmaWVsZF9uYW1lIjogInRpcF9hbW91bnQiLCAicGFuZGFzX3R5cGUiOiAiZmxvYXQzMiIsICJudW1weV90eXBlIjogImZsb2F0MzIiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogInRvbGxzX2Ftb3VudCIsICJmaWVsZF9uYW1lIjogInRvbGxzX2Ftb3VudCIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAidG90YWxfYW1vdW50IiwgImZpZWxkX25hbWUiOiAidG90YWxfYW1vdW50IiwgInBhbmRhc190eXBlIjogImZsb2F0MzIiLCAibnVtcHlfdHlwZSI6ICJmbG9hdDMyIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMC4xNS4xIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIwLjI1LjMifQAGAAAAcGFuZGFzAAASAAAAxAMAAHgDAABEAwAAAAMAAMgCAACMAgAAVAIAACACAADoAQAArAEAAHABAAA8AQAACAEAANgAAACoAAAAdAAAADwAAAAEAAAAlPz//wAAAQMYAAAADAAAAAQAAAAAAAAAyvz//wAAAQAMAAAAdG90YWxfYW1vdW50AAAAAMj8//8AAAEDGAAAAAwAAAAEAAAAAAAAAP78//8AAAEADAAAAHRvbGxzX2Ftb3VudAAAAAD8/P//AAABAxgAAAAMAAAABAAAAAAAAAAy/f//AAABAAoAAAB0aXBfYW1vdW50AAAs/f//AAABAxgAAAAMAAAABAAAAAAAAABi/f//AAABAAcAAABtdGFfdGF4AFj9//8AAAEDGAAAAAwAAAAEAAAAAAAAAI79//8AAAEABQAAAGV4dHJhAAAAhP3//wAAAQMYAAAADAAAAAQAAAAAAAAAuv3//wAAAQALAAAAZmFyZV9hbW91bnQAtP3//wAAAQUUAAAADAAAAAQAAAAAAAAApP3//wwAAABwYXltZW50X3R5cGUAAAAA5P3//wAAAQMYAAAADAAAAAQAAAAAAAAAGv7//wAAAQAQAAAAZHJvcG9mZl9sYXRpdHVkZQAAAAAc/v//AAABAxgAAAAMAAAABAAAAAAAAABS/v//AAABABEAAABkcm9wb2ZmX2xvbmdpdHVkZQAAAFT+//8AAAEFFAAAAAwAAAAEAAAAAAAAAET+//8SAAAAc3RvcmVfYW5kX2Z3ZF9mbGFnAACI/v//AAABARQAAAAMAAAABAAAAAAAAAB4/v//DAAAAHJhdGVfY29kZV9pZAAAAAC4/v//AAABAxgAAAAMAAAABAAAAAAAAADu/v//AAABAA8AAABwaWNrdXBfbGF0aXR1ZGUA7P7//wAAAQMYAAAADAAAAAQAAAAAAAAAIv///wAAAQAQAAAAcGlja3VwX2xvbmdpdHVkZQAAAAAk////AAABAxgAAAAMAAAABAAAAAAAAABa////AAABAA0AAAB0cmlwX2Rpc3RhbmNlAAAAWP///wAAAQIkAAAAFAAAAAQAAAAAAAAACAAMAAgABwAIAAAAAAAAAQgAAAAPAAAAcGFzc2VuZ2VyX2NvdW50AJj///8AAAEKGAAAAAwAAAAEAAAAAAAAAM7///8AAAMACgAAAGRyb3BvZmZfYXQAAMj///8AAAEKIAAAABQAAAAEAAAAAAAAAAAABgAIAAYABgAAAAAAAwAJAAAAcGlja3VwX2F0AAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFGAAAABAAAAAEAAAAAAAAAAQABAAEAAAACQAAAHZlbmRvcl9pZAAAAA==
{code}

I'd argue that extra metadata, if it's not part of the Arrow format and can be whatever an application wants to put in there, should not be printed as part of the schema's ToString method. It should be viewable some way, just not always. And IDK what to do with this {{ARROW:schema: }} business but it's clearly not readable as is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)